-
Notifications
You must be signed in to change notification settings - Fork 966
version: v0.1.41 #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
version: v0.1.41 #891
Conversation
<!-- .github/pull_request_template.md --> ## Description Resolve issue with .venv being broken when using docker compose with Cognee ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris Arzentar <[email protected]>
… 1947 (#760) <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Add support for UV and for Poetry package management ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Switch typing from str to UUID for NetworkX node_id ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Add both sse and stdio support for Cognee MCP ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
…83] (#782) <!-- .github/pull_request_template.md --> ## Description Add log handling options for cognee exceptions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Fix issue with failing versions gh actions ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description This PR adds support for the Memgraph graph database following the [graph database integration guide](https://docs.cognee.ai/contributing/adding-providers/graph-db/graph-database-integration): - Implemented `MemgraphAdapter` for interfacing with Memgraph. - Updated `get_graph_engine.py` to return MemgraphAdapter when appropriate. - Added a test script:` test_memgraph.py.` - Created a dedicated test workflow: `.github/workflows/test_memgraph.yml.` ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Vasilije <[email protected]> Co-authored-by: Boris <[email protected]>
<!-- .github/pull_request_template.md --> ## Description refactor: Handle boto3 s3fs dependencies better ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Update LanceDB and rewrite data points to run async ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Boris Arzentar <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description As discussed with @hande-k and Lazar, I've created a short demo to illustrate how to get the pagerank rankings from the knowledge graph given the nx engine. This is a POC, and a first of step towards solving #643 . Please let me know what you think, and how to proceed from here. :) ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Boris <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Added tools to check current cognify and codify status ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
…exist case <!-- .github/pull_request_template.md --> ## Description Fixes pipeline run status migration ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Fixes graph completion limit ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Adds modal parallel evaluation for retriever development ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> - set the parallel option to None in Fastembed's embedding function ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Adds dashboard application to parallel modal evals to enable fast retriever development/evaluation ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: lxobr <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Removes hardcoded user prompts from adapters ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: lxobr <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Adds chain of thought retriever ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Adds context extension search ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Igor Ilic <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Add info about installing Cognee locally ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Adds subgraph retriever to graph based completion searches ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Removes ontology resolver initialization at import. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Vasilije <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Removes graph metrics calculation from dynamic steps and ontology demos ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Removes unused properies from node and edge pydantic models. ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. Co-authored-by: Boris <[email protected]>
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
Please make sure all the checkboxes are checked:
|
|
Caution Review failedThe pull request is closed. WalkthroughThis update introduces major enhancements across the codebase, including support for new graph and vector database providers, expanded retriever and search functionality with node type and name filtering, new retriever classes, OpenAI-compatible API endpoints, improved pipeline execution with context propagation, and comprehensive documentation. Numerous bug fixes, test additions, and code refactoring are also included. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant API
participant ResponsesRouter
participant OpenAI
participant ToolDispatcher
participant Retriever
participant DB
User->>API: POST /api/v1/responses (input, tools, ...)
API->>ResponsesRouter: create_response(request)
ResponsesRouter->>OpenAI: Call responses API (input, tools)
OpenAI-->>ResponsesRouter: Returns function_call(s)
loop For each function_call
ResponsesRouter->>ToolDispatcher: dispatch_function(tool_call)
alt search
ToolDispatcher->>Retriever: handle_search(arguments, user)
Retriever->>DB: search/query (with node_type/node_name)
DB-->>Retriever: Results
Retriever-->>ToolDispatcher: Search results
else cognify/prune
ToolDispatcher->>DB: handle_cognify/prune(arguments, user)
DB-->>ToolDispatcher: Status/result
end
ToolDispatcher-->>ResponsesRouter: ToolCallOutput
end
ResponsesRouter-->>API: ResponseBody (tool_calls, usage, status)
API-->>User: JSON response
Possibly related PRs
Suggested labels
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 17116131 | Triggered | Generic Password | 3b07f3c | examples/database_examples/neo4j_example.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
🛑 Comments failed to post (70)
cognee/infrastructure/llm/prompts/render_prompt.py (1)
6-11: 🛠️ Refactor suggestion
⚠️ Potential issueDocstring mismatch: function is synchronous, not asynchronous
The summary states “Render a Jinja2 template asynchronously,” but the implementation is purely synchronous. Please update the docstring to reflect the correct behavior.Proposed diff:
- """ - Render a Jinja2 template asynchronously. + """ + Render a Jinja2 template synchronously. Set the base directory if not provided, initialize the Jinja2 environment, load the specified template, and render it using the provided context. <!-- suggestion_start --> <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion """ Render a Jinja2 template synchronously. Set the base directory if not provided, initialize the Jinja2 environment, load the specified template, and render it using the provided context.🤖 Prompt for AI Agents
In cognee/infrastructure/llm/prompts/render_prompt.py around lines 6 to 11, the docstring incorrectly states that the function renders a Jinja2 template asynchronously, but the implementation is synchronous. Update the docstring summary to say "Render a Jinja2 template synchronously" to accurately describe the function's behavior.
cognee/infrastructure/llm/tokenizer/tokenizer_interface.py (1)
12-21: 🛠️ Refactor suggestion
Improve docstring completeness by adding return type documentation.
The docstring is well-structured but missing the return type specification. Based on the relevant code snippets from the adapters, this method returns different types depending on the implementation (token IDs for TikToken, token strings for HuggingFace).
Apply this diff to add the missing return type documentation:
def extract_tokens(self, text: str) -> List[Any]: """ Extract tokens from the given text. Parameters: ----------- - text (str): The input text from which to extract tokens. + + Returns: + -------- + + - List[Any]: A list of tokens extracted from the input text. The specific + type of tokens depends on the implementation (e.g., token IDs or strings). """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def extract_tokens(self, text: str) -> List[Any]: """ Extract tokens from the given text. Parameters: ----------- - text (str): The input text from which to extract tokens. Returns: -------- - List[Any]: A list of tokens extracted from the input text. The specific type of tokens depends on the implementation (e.g., token IDs or strings). """ raise NotImplementedError🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/tokenizer_interface.py around lines 12 to 21, the extract_tokens method's docstring lacks a return type description. Add a Returns section specifying that the method returns a list of tokens, noting that the exact type may vary by implementation (e.g., token IDs or token strings). This will complete the docstring and clarify the expected output.
Dockerfile_modal (1)
21-21: 💡 Verification agent
🧩 Analysis chain
Verify README.md necessity in Docker build.
Including README.md in the initial copy step is efficient if it's required for the build process. Please ensure that README.md is actually needed during the Docker build.
🏁 Script executed:
#!/bin/bash # Check if README.md is referenced in pyproject.toml or used during Poetry installation echo "Checking if README.md is referenced in pyproject.toml..." grep -i "readme" pyproject.toml || echo "No readme references found in pyproject.toml" echo -e "\nChecking for README references in Poetry-related files..." rg -i "readme" --type tomlLength of output: 459
Let’s locate the actual file paths and check all references before concluding.
🏁 Script executed:
#!/bin/bash set -eux # Locate project files echo "Locating pyproject.toml..." fd --hidden --max-depth 4 --glob pyproject.toml echo "Locating Dockerfile_modal..." fd --hidden --max-depth 4 --glob Dockerfile_modal # Use the first matches PY_PATH=$(fd --hidden --max-depth 4 --glob pyproject.toml | head -n1) DOCKER_PATH=$(fd --hidden --max-depth 4 --glob Dockerfile_modal | head -n1) echo "pyproject.toml found at: $PY_PATH" echo "Dockerfile_modal found at: $DOCKER_PATH" # Check for README references in pyproject.toml echo "Searching $PY_PATH for README:" grep -i "readme" "$PY_PATH" || echo "No README references in $PY_PATH" # List all COPY lines in Dockerfile_modal echo "Listing COPY instructions in $DOCKER_PATH:" grep -n "COPY " "$DOCKER_PATH" # Show context around README.md in Dockerfile_modal echo "Context around README.md usage:" grep -n -C3 "README.md" "$DOCKER_PATH" || echo "No further README.md usage found"Length of output: 1195
Let’s locate the actual
pyproject.tomland inspect it for anyreadmemetadata.
🏁 Script executed:
#!/bin/bash set -eux # Search for pyproject.toml anywhere in the repo echo "Searching for pyproject.toml in the repository..." find . -type f -iname 'pyproject.toml' -print || echo "No pyproject.toml found" # If found, check for a 'readme' field echo -e "\nChecking each pyproject.toml for a readme entry:" find . -type f -iname 'pyproject.toml' -exec grep -H -i '^readme' {} \; || echo "No 'readme' metadata entries found"Length of output: 353
Remove unnecessary README.md from Dockerfile_modal COPY
The Dockerfile_modal is copying
README.md(alongsidepyproject.tomlandpoetry.lock) on line 21, but there’s noreadmefield in anypyproject.tomland no evidence thatREADME.mdis used during the build. Keeping it will bloat your image and may even break the build if the file isn’t present in the context.• File: Dockerfile_modal
Line 21:- COPY pyproject.toml poetry.lock README.md /app/ + COPY pyproject.toml poetry.lock /app/If you do need
README.mdat build time (for example, for packaging or documentation), please add a comment in the Dockerfile explaining its usage or adjust its path accordingly.🤖 Prompt for AI Agents
In Dockerfile_modal at line 21, the COPY instruction includes README.md, but there is no indication that README.md is required during the build process or referenced in pyproject.toml. To fix this, remove README.md from the COPY command to avoid unnecessary image bloat and potential build errors if the file is missing. If README.md is actually needed, add a comment explaining its purpose or adjust the path accordingly.
Dockerfile (1)
32-44: 💡 Verification agent
🧩 Analysis chain
Verify compatibility with Python 3.12 upgrade and uv dependency resolution.
The change from Poetry to uv and Python 3.11 to 3.12 represents significant infrastructure changes that could affect:
- Package compatibility with Python 3.12
- Dependency resolution differences between Poetry and uv
- Lock file format changes
Run the following script to verify the compatibility:
🏁 Script executed:
#!/bin/bash # Description: Check for Python 3.12 compatibility issues and verify uv.lock exists # Check if uv.lock exists and is valid if [ -f "uv.lock" ]; then echo "✓ uv.lock file exists" else echo "✗ uv.lock file missing - required for uv sync" fi # Check for Python version constraints in pyproject.toml rg -A 5 -B 5 "python.*=" pyproject.toml # Look for any Python 3.11 specific code or comments rg -i "python.*3\.11|py311" --type pyLength of output: 355
Fix missing uv.lock and update Python version references
The Dockerfile’s
uv syncsteps require a committeduv.lock, and there’s still a hard-coded Python 3.11 reference in the modal dashboard. Please:
- Add
uv.lockto the repository root so that
RUN uv sync … --frozencan resolve dependencies reproducibly.- Update cognee/eval_framework/modal_eval_dashboard.py:
- modal.Image.debian_slim(python_version="3.11") + modal.Image.debian_slim(python_version="3.12")- Verify and update your project’s Python version constraint (e.g. in pyproject.toml or equivalent) to allow 3.12.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In Dockerfile lines 32 to 44, the uv sync commands require a uv.lock file to ensure reproducible dependency resolution, but uv.lock is missing from the repository. Add a valid uv.lock file at the repository root so that the uv sync commands with --frozen flag can work correctly. Additionally, update the Python version constraints in pyproject.toml and any hard-coded Python 3.11 references, such as in cognee/eval_framework/modal_eval_dashboard.py, to support Python 3.12 compatibility.
cognee/infrastructure/databases/vector/__init__.py (1)
6-6: 🛠️ Refactor suggestion
Handle the unused import warning and explicitly expose the adapter.
The static analyzer flaggeduse_vector_adapteras an unused import. Since it’s meant to be part of the public API, consider adding an__all__list to this module (e.g.,__all__ = [..., "use_vector_adapter"]) or remove the import if it’s not intended for direct external usage.🧰 Tools
🪛 Ruff (0.11.9)
6-6:
.use_vector_adapter.use_vector_adapterimported but unused; consider removing, adding to__all__, or using a redundant alias(F401)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/__init__.py at line 6, the import of use_vector_adapter is flagged as unused. To fix this, add an __all__ list to explicitly declare use_vector_adapter as part of the public API, for example __all__ = ["use_vector_adapter"], so the import is recognized as intentional and exposed properly.
cognee/infrastructure/llm/prompts/cot_validation_system_prompt.txt (1)
1-1:
⚠️ Potential issueFix grammar error in the prompt.
The static analysis tool correctly identified a subject-verb disagreement. "You are a helpful agent who are allowed" should be "You are a helpful agent who is allowed."
Apply this diff to fix the grammar:
-You are a helpful agent who are allowed to use only the provided question answer and context. +You are a helpful agent who is allowed to use only the provided question answer and context.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.You are a helpful agent who is allowed to use only the provided question answer and context.🧰 Tools
🪛 LanguageTool
[uncategorized] ~1-~1: This verb does not appear to agree with the subject. Consider using a different form.
Context: You are a helpful agent who are allowed to use only the provided questi...(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)
🤖 Prompt for AI Agents
In cognee/infrastructure/llm/prompts/cot_validation_system_prompt.txt at line 1, fix the subject-verb agreement by changing "who are allowed" to "who is allowed" to correct the grammar in the prompt.
cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx (1)
5-5: 💡 Verification agent
🧩 Analysis chain
Verify dataset prop type consistency across frontend components.
The change from
{ id: string }to{ name: string }aligns with the frontend standardization efforts mentioned in the AI summary. Ensure this change is consistently applied across all components that use dataset props.
🏁 Script executed:
#!/bin/bash # Description: Verify consistent dataset prop types across frontend components # Expected: All dataset props should use {name: string} consistently # Search for dataset prop type definitions rg -A 2 -B 2 "dataset.*:\s*\{.*\}" cognee-frontend/src --type tsx --type ts # Search for Explorer component usage rg -A 3 -B 1 "<Explorer.*dataset" cognee-frontend/src --type tsxLength of output: 200
To ensure we catch all occurrences without relying on rg’s built-in types, let’s search by file extension instead:
🏁 Script executed:
#!/bin/bash # Verify all dataset prop definitions use { name: string } rg -n "dataset\s*:\s*{[^}]+}" --glob '*.ts' --glob '*.tsx' cognee-frontend/src # Verify all <Explorer> usages pass a dataset prop rg -n "<Explorer[^>]*\bdataset\b" --glob '*.tsx' cognee-frontend/srcLength of output: 481
Let’s check for any lingering
dataset.idusages and locate the Explorer component’s prop definition:
🏁 Script executed:
#!/bin/bash # 1. Look for any dataset.id usages rg -n "dataset\.id" --glob '*.ts' --glob '*.tsx' cognee-frontend/src # 2. Find all dataset prop declarations (inline or via interface) rg -n "dataset\s*:" --glob '*.ts' --glob '*.tsx' cognee-frontend/src # 3. Locate the Explorer component definition and its props rg -n "export \(function\|const\) Explorer" -A 10 --glob '*.ts' --glob '*.tsx' cognee-frontend/srcLength of output: 3262
Fix dataset prop type inconsistencies across the codebase
While
ExploreStepandExplorernow usedataset: { name: string }, many other modules still expect or referencedataset.id. This mismatch will lead to type errors and broken runtime behavior. Please update all consumers and definitions of “dataset” to use the standardized{ name: string }shape (or a sharedDatasettype) consistently.Locations needing attention:
src/modules/ingestion/addData.ts– function signature uses{ id?: string, name?: string }src/app/page.tsx
openDatasetData(dataset: { id: string })onDataAdd(dataset: { id: string }, …)onDatasetCognify(dataset: { id: string, name: string })src/modules/datasets/getDatasetData.ts– expects{ id: string }src/modules/datasets/deleteDataset.ts– expects{ id: string }src/modules/datasets/cognifyDataset.ts– signature uses{ id?: string, name?: string }src/modules/ingestion/useDatasets.ts– referencesdataset.idfor status mapping- Any other callsites or interfaces where
dataset.idis usedPlease refactor these to rely solely on
dataset.name(or extend the sharedDatasetinterface) so that all consumers align with the new standard.🤖 Prompt for AI Agents
In cognee-frontend/src/app/wizard/ExploreStep/ExploreStep.tsx at line 5, the dataset prop type was changed from { id: string } to { name: string } to standardize usage. However, many other files still use dataset.id, causing type inconsistencies and potential runtime errors. To fix this, update all dataset prop definitions, function signatures, and usages across the codebase—including src/modules/ingestion/addData.ts, src/app/page.tsx, src/modules/datasets/getDatasetData.ts, src/modules/datasets/deleteDataset.ts, src/modules/datasets/cognifyDataset.ts, and src/modules/ingestion/useDatasets.ts—to use { name: string } consistently or a shared Dataset interface that includes name but not id. Replace all references to dataset.id with dataset.name accordingly.
cognee/modules/data/methods/get_unique_dataset_id.py (1)
5-6: 🛠️ Refactor suggestion
Address potential UUID collision risk and async function design.
The function has several concerns:
Collision Risk: Direct string concatenation of
dataset_nameanduser.idcould lead to collisions. For example,dataset_name="user123"withuser.id="456"produces the same input asdataset_name="user123456"withuser.id="".Unnecessary async: The function is declared as
asyncbut performs no asynchronous operations.Consider this improved implementation:
-async def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: - return uuid5(NAMESPACE_OID, f"{dataset_name}{str(user.id)}") +def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: + """Generate a deterministic UUID for a dataset based on name and user.""" + if not dataset_name or not dataset_name.strip(): + raise ValueError("Dataset name cannot be empty") + # Use a delimiter to prevent collision + unique_string = f"{dataset_name}::{user.id}" + return uuid5(NAMESPACE_OID, unique_string)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def get_unique_dataset_id(dataset_name: str, user: User) -> UUID: """Generate a deterministic UUID for a dataset based on name and user.""" if not dataset_name or not dataset_name.strip(): raise ValueError("Dataset name cannot be empty") # Use a delimiter to prevent collisions between name and ID unique_string = f"{dataset_name}::{user.id}" return uuid5(NAMESPACE_OID, unique_string)🤖 Prompt for AI Agents
In cognee/modules/data/methods/get_unique_dataset_id.py at lines 5 to 6, the function get_unique_dataset_id is unnecessarily declared async and concatenates dataset_name and user.id directly, risking UUID collisions. Remove the async keyword since no await is used, and instead of simple concatenation, combine dataset_name and user.id with a clear delimiter or use a tuple-like structure to ensure uniqueness and avoid collisions before passing to uuid5.
cognee/api/v1/config/config.py (1)
158-158:
⚠️ Potential issueFix inconsistent error handling.
This method uses
AttributeErrorwhile all other similar config setter methods in this file useInvalidAttributeErrorwith themessage=parameter. This creates inconsistency in error handling.Apply this diff to maintain consistency:
- raise AttributeError(f"'{key}' is not a valid attribute of the config.") + raise InvalidAttributeError( + message=f"'{key}' is not a valid attribute of the config." + )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.raise InvalidAttributeError( message=f"'{key}' is not a valid attribute of the config." )🤖 Prompt for AI Agents
In cognee/api/v1/config/config.py at line 158, replace the use of AttributeError with InvalidAttributeError and pass the error message using the message= parameter to align with the error handling style used in other config setter methods. This will ensure consistent error handling across the file.
cognee/infrastructure/databases/vector/use_vector_adapter.py (1)
4-5: 🛠️ Refactor suggestion
Enhance function robustness and documentation.
The core functionality looks good for enabling dynamic vector adapter registration. Consider these improvements:
-def use_vector_adapter(vector_db_name, vector_db_adapter): +def use_vector_adapter(vector_db_name: str, vector_db_adapter) -> None: + """ + Register a vector database adapter for dynamic use. + + Args: + vector_db_name: Name of the vector database provider + vector_db_adapter: Adapter class or instance for the provider + """ + if not vector_db_name: + raise ValueError("vector_db_name cannot be empty") + if vector_db_adapter is None: + raise ValueError("vector_db_adapter cannot be None") + supported_databases[vector_db_name] = vector_db_adapterThis adds type hints, documentation, and basic input validation to improve code quality and prevent common usage errors.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def use_vector_adapter(vector_db_name: str, vector_db_adapter) -> None: """ Register a vector database adapter for dynamic use. Args: vector_db_name: Name of the vector database provider vector_db_adapter: Adapter class or instance for the provider """ if not vector_db_name: raise ValueError("vector_db_name cannot be empty") if vector_db_adapter is None: raise ValueError("vector_db_adapter cannot be None") supported_databases[vector_db_name] = vector_db_adapter🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/use_vector_adapter.py at lines 4 to 5, the function use_vector_adapter lacks type hints, documentation, and input validation. Add appropriate type hints for the parameters and return type, include a docstring explaining the function's purpose and usage, and add basic input validation to check that vector_db_name is a string and vector_db_adapter is a valid adapter object before registering it in supported_databases.
cognee/infrastructure/databases/graph/use_graph_adapter.py (1)
4-5: 🛠️ Refactor suggestion
Fix misleading parameter name and add input validation.
The parameter name
vector_db_nameis misleading since this function registers graph database adapters, not vector database adapters. Additionally, the function lacks input validation and documentation.Apply this diff to improve the implementation:
-def use_graph_adapter(vector_db_name, vector_db_adapter): - supported_databases[vector_db_name] = vector_db_adapter +def use_graph_adapter(graph_db_name: str, graph_db_adapter): + """ + Register or update a graph database adapter in the supported databases registry. + + Args: + graph_db_name (str): The name identifier for the graph database + graph_db_adapter: The adapter class for the graph database + + Raises: + ValueError: If graph_db_name is empty or None + TypeError: If graph_db_adapter is None + """ + if not graph_db_name: + raise ValueError("Graph database name cannot be empty or None") + if graph_db_adapter is None: + raise TypeError("Graph database adapter cannot be None") + + supported_databases[graph_db_name] = graph_db_adapter📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def use_graph_adapter(graph_db_name: str, graph_db_adapter): """ Register or update a graph database adapter in the supported databases registry. Args: graph_db_name (str): The name identifier for the graph database graph_db_adapter: The adapter class for the graph database Raises: ValueError: If graph_db_name is empty or None TypeError: If graph_db_adapter is None """ if not graph_db_name: raise ValueError("Graph database name cannot be empty or None") if graph_db_adapter is None: raise TypeError("Graph database adapter cannot be None") supported_databases[graph_db_name] = graph_db_adapter🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/use_graph_adapter.py around lines 4 to 5, rename the parameter vector_db_name to graph_db_name to accurately reflect that the function registers graph database adapters. Add input validation to check that graph_db_name is a non-empty string and vector_db_adapter is a valid adapter object. Also, include a docstring explaining the function's purpose, parameters, and behavior.
cognee/modules/observability/get_observe.py (1)
5-11:
⚠️ Potential issueFix missing return statement and improve error handling.
The function has a critical issue where it doesn't return anything when
monitoringis notObserver.LANGFUSE, which will returnNoneimplicitly. Additionally, the function lacks error handling for import failures and proper documentation.Apply this diff to fix the issues:
+from typing import Optional, Callable + def get_observe(): + """ + Get the appropriate observation decorator based on the configured monitoring tool. + + Returns: + Optional[Callable]: The observe decorator if available, None otherwise + + Raises: + ImportError: If the required monitoring tool package is not installed + """ monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: - from langfuse.decorators import observe - - return observe + try: + from langfuse.decorators import observe + return observe + except ImportError as e: + raise ImportError(f"Langfuse package not installed: {e}") + + # Return None for unsupported monitoring tools + return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from typing import Optional, Callable def get_observe(): """ Get the appropriate observation decorator based on the configured monitoring tool. Returns: Optional[Callable]: The observe decorator if available, None otherwise Raises: ImportError: If the required monitoring tool package is not installed """ monitoring = get_base_config().monitoring_tool if monitoring == Observer.LANGFUSE: try: from langfuse.decorators import observe return observe except ImportError as e: raise ImportError(f"Langfuse package not installed: {e}") # Return None for unsupported monitoring tools return None🧰 Tools
🪛 Pylint (3.3.7)
[error] 9-9: Unable to import 'langfuse.decorators'
(E0401)
[convention] 9-9: Import outside toplevel (langfuse.decorators.observe)
(C0415)
[refactor] 5-5: Either all return statements in a function should return an expression, or none of them should.
(R1710)
🤖 Prompt for AI Agents
In cognee/modules/observability/get_observe.py around lines 5 to 11, the function get_observe lacks a return statement when monitoring is not Observer.LANGFUSE, causing it to implicitly return None. To fix this, add a default return value or raise an appropriate exception for unsupported monitoring tools. Also, wrap the import statement in a try-except block to handle import errors gracefully and add a docstring to document the function's behavior and possible exceptions.
cognee-frontend/src/modules/datasets/cognifyDataset.ts (1)
3-3: 🛠️ Refactor suggestion
Add validation for required parameters.
The function signature makes both
idandnameoptional, but at least one should be provided for the API request to be meaningful.-export default function cognifyDataset(dataset: { id?: string, name?: string }) { +export default function cognifyDataset(dataset: { id?: string, name?: string }) { + if (!dataset.id && !dataset.name) { + throw new Error('Either dataset id or name must be provided'); + }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.export default function cognifyDataset(dataset: { id?: string, name?: string }) { if (!dataset.id && !dataset.name) { throw new Error('Either dataset id or name must be provided'); } // …rest of the existing function body… }🤖 Prompt for AI Agents
In cognee-frontend/src/modules/datasets/cognifyDataset.ts at line 3, the function parameters id and name are both optional, but the function requires at least one to be provided. Add validation inside the function to check if either id or name is present; if neither is provided, throw an error or return early to prevent meaningless API requests.
cognee-frontend/src/app/page.tsx (1)
63-69:
⚠️ Potential issueAdd null safety check for dataset lookup.
The callback uses non-null assertion (
dataset!) without verifying that the dataset was found, which could cause a runtime error ifselectedDatasetdoesn't match any dataset ID.const onCognify = useCallback(() => { const dataset = datasets.find((dataset) => dataset.id === selectedDataset); + if (!dataset) { + showNotification("Selected dataset not found. Please try again.", 5000); + return Promise.reject(new Error("Dataset not found")); + } return onDatasetCognify({ - id: dataset!.id, - name: dataset!.name, + id: dataset.id, + name: dataset.name, }); }, [datasets, onDatasetCognify, selectedDataset]);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.const onCognify = useCallback(() => { const dataset = datasets.find((dataset) => dataset.id === selectedDataset); if (!dataset) { showNotification("Selected dataset not found. Please try again.", 5000); return Promise.reject(new Error("Dataset not found")); } return onDatasetCognify({ id: dataset.id, name: dataset.name, }); }, [datasets, onDatasetCognify, selectedDataset]);🤖 Prompt for AI Agents
In cognee-frontend/src/app/page.tsx around lines 63 to 69, the onCognify callback uses non-null assertions on dataset without checking if the dataset exists, risking runtime errors. Modify the callback to first check if the dataset is found; if not, handle the null case gracefully (e.g., return early or show an error). Only access dataset properties after confirming it is not null or undefined.
cognee/eval_framework/metrics_dashboard.py (1)
170-170: 💡 Verification agent
🧩 Analysis chain
Verify breaking change in return type.
The function now returns HTML content instead of the output file path. This is a breaking change that could affect existing callers expecting a file path.
Run the following script to verify how this function is used across the codebase:
🏁 Script executed:
#!/bin/bash # Description: Check all usages of create_dashboard function to verify the impact of changing return type from file path to HTML content. # Search for function calls to create_dashboard rg -A 5 -B 2 "create_dashboard\(" --type py # Search for any variable assignments or return value usage ast-grep --pattern $'$var = create_dashboard($$$)' # Search for any code that might expect a file path return rg -A 3 -B 3 "\.html" --type py | rg -A 3 -B 3 "create_dashboard"Length of output: 4871
Update create_dashboard’s return type and dependent code
The change in
create_dashboardfrom returning the output file path to returning the HTML content is a breaking change. Callers and tests that expect a file path must be updated to handle HTML output.• In
cognee/eval_framework/metrics_dashboard.py, update the docstring/signature to state that the function returns the rendered HTML, not the file path.
• Incognee/tests/unit/eval_framework/dashboard_test.py:
– Rename the returned variable todashboard_htmlfor clarity.
– Replace the file-path assertion with an HTML-content check, then assert file creation separately.
Diff example:- output = create_dashboard(metrics_path, aggregate_metrics_path, output_file, "Test Benchmark") - self.assertEqual(output, output_file) + dashboard_html = create_dashboard(metrics_path, aggregate_metrics_path, output_file, "Test Benchmark") + self.assertIn("<html", dashboard_html) + self.assertTrue(os.path.exists(output_file))• In
cognee/eval_framework/modal_run_eval.py, review howhtml_output = create_dashboard(...)is used downstream—ensure it’s treated as HTML, not a file path.
• Incognee/eval_framework/run_eval.py, consider whether you need to capture the return value now that it’s HTML (or explicitly ignore it).
• Incognee/eval_framework/analysis/dashboard_generator.py, if there’s a duplicatecreate_dashboard, ensure its signature and return semantics match.Please update these locations to align with the new HTML-return behavior.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/eval_framework/metrics_dashboard.py at line 170, the create_dashboard function now returns HTML content instead of a file path, which is a breaking change. Update the function's docstring and signature to reflect that it returns rendered HTML. Then, in cognee/tests/unit/eval_framework/dashboard_test.py, rename variables to indicate HTML content, replace file path assertions with checks on the HTML content, and separately assert that the output file is created. Also, review and update all callers in cognee/eval_framework/modal_run_eval.py, run_eval.py, and analysis/dashboard_generator.py to handle the returned HTML correctly instead of expecting a file path, adjusting variable names and logic as needed to align with this change.
cognee/tests/test_relational_db_migration.py (1)
161-162:
⚠️ Potential issueFix potential NameError for uninitialized variables.
The static analysis correctly identifies that
node_countandedge_countmay be used before assignment if an unsupported graph database provider is encountered.Apply this diff to initialize the variables and improve error handling:
else: raise ValueError(f"Unsupported graph database provider: {graph_db_provider}") + # Ensure variables are initialized before assertions + if 'node_count' not in locals() or 'edge_count' not in locals(): + raise ValueError(f"Failed to retrieve node/edge counts for provider: {graph_db_provider}") + # NOTE: Because of the different size of the postgres and sqlite databases, # different number of nodes and edges are expected assert node_count == 543, f"Expected 543 nodes, got {node_count}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.else: raise ValueError(f"Unsupported graph database provider: {graph_db_provider}") # Ensure variables are initialized before assertions if 'node_count' not in locals() or 'edge_count' not in locals(): raise ValueError(f"Failed to retrieve node/edge counts for provider: {graph_db_provider}") # NOTE: Because of the different size of the postgres and sqlite databases, # different number of nodes and edges are expected assert node_count == 543, f"Expected 543 nodes, got {node_count}" assert edge_count == 1317, f"Expected 1317 edges, got {edge_count}"🧰 Tools
🪛 Pylint (3.3.7)
[error] 161-161: Possibly using variable 'node_count' before assignment
(E0606)
[error] 162-162: Possibly using variable 'edge_count' before assignment
(E0606)
🤖 Prompt for AI Agents
In cognee/tests/test_relational_db_migration.py around lines 161 to 162, initialize the variables node_count and edge_count before their usage to prevent potential NameError if an unsupported graph database provider is encountered. Add default initializations for these variables at the start of the relevant code block and include error handling to manage unsupported providers gracefully, ensuring the variables are always assigned before the assertions.
cognee/shared/logging_utils.py (1)
7-7:
⚠️ Potential issueFix import issues identified by static analysis.
The static analysis tools correctly identified import-related issues that should be addressed for code quality:
import os import sys import threading import logging -import structlog -import traceback import platform -from datetime import datetime -from pathlib import Path -import importlib.metadata +import traceback +from datetime import datetime +from pathlib import Path + +import structlog -from cognee import __version__ as cognee_version +from cognee import __version__ as cognee_versionThe
importlib.metadataimport should be removed as it's unused (structlog.version is accessed directly).Also applies to: 10-10, 12-12
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 7-7: standard import "platform" should be placed before third party import "structlog"
(C0411)
🤖 Prompt for AI Agents
In cognee/shared/logging_utils.py at lines 7, 10, and 12, remove the import of importlib.metadata since it is unused and structlog.__version__ is accessed directly. This will fix the import issues flagged by static analysis and improve code quality.
alembic/versions/1d0bb7fede17_add_pipeline_run_status.py (1)
32-33: 🛠️ Refactor suggestion
Consider documenting the downgrade limitation.
The empty downgrade function could be problematic if rollback is needed. PostgreSQL doesn't easily support removing enum values once they're in use, but this should be documented.
Consider adding a comment explaining the limitation:
def downgrade() -> None: - pass + # Note: PostgreSQL does not support removing enum values that are in use. + # Manual intervention may be required if rollback is necessary. + passAlternatively, consider implementing a check to ensure the enum value is not in use before attempting removal.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def downgrade() -> None: # Note: PostgreSQL does not support removing enum values that are in use. # Manual intervention may be required if rollback is necessary. pass🤖 Prompt for AI Agents
In alembic/versions/1d0bb7fede17_add_pipeline_run_status.py at lines 32-33, the downgrade function is empty which can cause issues during rollback since PostgreSQL does not easily support removing enum values. Add a comment inside the downgrade function explaining this limitation and why the downgrade is not implemented. Optionally, you can add a check to ensure the enum value is not in use before attempting removal, but at minimum document the limitation clearly.
cognee/modules/pipelines/operations/log_pipeline_run_initiated.py (1)
6-10: 💡 Verification agent
🧩 Analysis chain
Verify pipeline_id type consistency.
The
pipeline_idparameter is typed asstr, but thePipelineRunmodel'spipeline_idfield is defined asUUIDtype in the relevant code snippets. This could lead to type conversion issues.
🏁 Script executed:
#!/bin/bash # Description: Check how pipeline_id is used across the codebase to verify expected type # Expected: Find usages that clarify whether pipeline_id should be str or UUID echo "=== Searching for pipeline_id field usage in PipelineRun model ===" ast-grep --pattern 'pipeline_id = Column($$$)' echo -e "\n=== Searching for pipeline_id parameter usage in other functions ===" rg -A 3 -B 3 "pipeline_id.*:" --type py echo -e "\n=== Searching for log_pipeline_run_initiated function calls ===" rg -A 5 -B 2 "log_pipeline_run_initiated" --type pyLength of output: 6983
Ensure pipeline_id parameters use UUID, not str
The
PipelineRun.pipeline_idcolumn is defined as a UUID, but alllog_pipeline_run_*functions currently annotatepipeline_id: str. Call sites already pass auuid.UUID, so we should update the signatures (and any related tooling) to match:• Update in cognee/modules/pipelines/operations/log_pipeline_run_initiated.py
• Also apply to log_pipeline_run_start, log_pipeline_run_complete, and log_pipeline_run_errorExample diff for
log_pipeline_run_initiated:- async def log_pipeline_run_initiated(pipeline_id: str, pipeline_name: str, dataset_id: UUID): + async def log_pipeline_run_initiated(pipeline_id: UUID, pipeline_name: str, dataset_id: UUID):Repeat for the other three functions to prevent type mismatches when persisting to
PipelineRun(pipeline_id = Column(UUID, …)).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def log_pipeline_run_initiated(pipeline_id: UUID, pipeline_name: str, dataset_id: UUID): pipeline_run = PipelineRun( pipeline_run_id=uuid4(), pipeline_name=pipeline_name, pipeline_id=pipeline_id,🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/log_pipeline_run_initiated.py around lines 6 to 10, the pipeline_id parameter is typed as str but the PipelineRun model expects a UUID type for pipeline_id. Change the type annotation of pipeline_id from str to UUID in the function signature to ensure type consistency and prevent conversion issues. Also, update the other related log_pipeline_run_* functions similarly to use UUID for pipeline_id.
cognee/infrastructure/llm/tokenizer/HuggingFace/adapter.py (1)
67-78:
⚠️ Potential issueFix parameter name to match interface contract.
The static analysis correctly identifies that the parameter name should be
tokeninstead ofencodingto match the interface definition inTokenizerInterface.- def decode_single_token(self, encoding: int): + def decode_single_token(self, token: int): """ Attempt to decode a single token from its encoding, which is not implemented in this tokenizer. Parameters: ----------- - - encoding (int): The integer encoding of the token to decode. + - token (int): The integer encoding of the token to decode. """ # HuggingFace tokenizer doesn't have the option to decode tokens raise NotImplementedError📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def decode_single_token(self, token: int): """ Attempt to decode a single token from its encoding, which is not implemented in this tokenizer. Parameters: ----------- - token (int): The integer encoding of the token to decode. """ # HuggingFace tokenizer doesn't have the option to decode tokens raise NotImplementedError🧰 Tools
🪛 Pylint (3.3.7)
[warning] 67-67: Parameter 'token' has been renamed to 'encoding' in overriding 'HuggingFaceTokenizer.decode_single_token' method
(W0237)
🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/HuggingFace/adapter.py between lines 67 and 78, rename the parameter of the decode_single_token method from 'encoding' to 'token' to match the interface definition in TokenizerInterface. This ensures consistency with the expected method signature and resolves the static analysis warning.
cognee-frontend/src/modules/ingestion/DataView/DataView.tsx (1)
36-36: 🛠️ Refactor suggestion
Consider renaming the component to avoid shadowing.
The component name
DataViewshadows the globalDataViewconstructor, which could lead to confusion when debugging or referencing the global object.Consider renaming the component to something more specific like
DataSetVieworCogneeDataView:-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) { +export default function DataSetView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.-export default function DataView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) { +export default function DataSetView({ datasetId, data, onClose, onDataAdd, onCognify }: DataViewProps) {🧰 Tools
🪛 Biome (1.9.4)
[error] 36-36: Do not shadow the global "DataView" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
🤖 Prompt for AI Agents
In cognee-frontend/src/modules/ingestion/DataView/DataView.tsx at line 36, the component name DataView shadows the global DataView constructor, which can cause confusion. Rename the component to a more specific name such as DataSetView or CogneeDataView throughout the file, including the export statement and any references to this component, to avoid shadowing the global object.
cognee/infrastructure/engine/models/DataPoint.py (1)
159-159:
⚠️ Potential issueFix class method parameter convention violation.
The static analysis correctly identifies that class methods should use
clsas the first parameter instead ofself. This violates Python conventions and could cause confusion.Apply this fix:
@classmethod -def from_json(self, json_str: str): +def from_json(cls, json_str: str): """ Deserialize a DataPoint instance from a JSON string. ... """ - return self.model_validate_json(json_str) + return cls.model_validate_json(json_str) @classmethod -def from_pickle(self, pickled_data: bytes): +def from_pickle(cls, pickled_data: bytes): """ Deserialize a DataPoint instance from a pickled byte stream. ... """ data = pickle.loads(pickled_data) - return self(**data) + return cls(**data)Also applies to: 195-195
🧰 Tools
🪛 Pylint (3.3.7)
[convention] 159-159: Class method from_json should have 'cls' as first argument
(C0202)
🤖 Prompt for AI Agents
In cognee/infrastructure/engine/models/DataPoint.py at lines 159 and 195, the class methods currently use 'self' as the first parameter, which violates Python conventions. Change the first parameter of these class methods from 'self' to 'cls' to correctly follow the class method parameter convention.
cognee/eval_framework/answer_generation/answer_generation_executor.py (1)
15-15: 🛠️ Refactor suggestion
Type annotation weakens type safety.
Changing from
Dict[str, BaseRetriever]toDict[str, Any]reduces type safety. If the new retrievers inherit fromBaseRetriever, consider keeping the stronger typing or using a Union type.-retriever_options: Dict[str, Any] = { +retriever_options: Dict[str, BaseRetriever] = {Alternatively, if some retrievers don't inherit from
BaseRetriever, consider using a Union type or creating a common interface.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.-retriever_options: Dict[str, Any] = { +retriever_options: Dict[str, BaseRetriever] = {🤖 Prompt for AI Agents
In cognee/eval_framework/answer_generation/answer_generation_executor.py at line 15, the type annotation for retriever_options is currently Dict[str, Any], which weakens type safety. To fix this, change the annotation to Dict[str, BaseRetriever] if all retrievers inherit from BaseRetriever. If some retrievers do not inherit from BaseRetriever, use a Union type including BaseRetriever and other relevant types or define a common interface that all retrievers implement, then use that interface in the type annotation.
cognee/tasks/chunks/chunk_by_sentence.py (1)
36-52: 🛠️ Refactor suggestion
Incomplete docstring - missing Returns section.
The docstring provides excellent detail about the function's behavior and parameters, but appears to be missing the Returns section that should describe the
Iterator[Tuple[UUID, str, int, Optional[str]]]return type.Add the missing Returns section:
generated. (default None) + + Returns: + -------- + + - Iterator[Tuple[UUID, str, int, Optional[str]]]: An iterator yielding tuples containing: + - UUID: Unique identifier for the paragraph + - str: The sentence text + - int: The size of the sentence in tokens + - Optional[str]: The sentence type ('sentence_end', 'paragraph_end', 'sentence_cut', etc.) """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.Splits text into sentences while preserving word and paragraph boundaries. This function processes the input string, dividing it into sentences based on word-level tokenization. Each sentence is identified with a unique UUID, and it handles scenarios where the text may end mid-sentence by tagging it with a specific type. If a maximum sentence length is specified, the function ensures that sentences do not exceed this length, raising a ValueError if an individual word surpasses it. The function utilizes an external word processing function `chunk_by_word` to determine the structure of the text. Parameters: ----------- - data (str): The input text to be split into sentences. - maximum_size (Optional[int]): An optional limit on the maximum size of sentences generated. (default None) Returns: -------- - Iterator[Tuple[UUID, str, int, Optional[str]]]: An iterator yielding tuples containing: - UUID: Unique identifier for the paragraph - str: The sentence text - int: The size of the sentence in tokens - Optional[str]: The sentence type ('sentence_end', 'paragraph_end', 'sentence_cut', etc.)🤖 Prompt for AI Agents
In cognee/tasks/chunks/chunk_by_sentence.py around lines 36 to 52, the function's docstring lacks a Returns section describing the return type. Add a Returns section that clearly states the function returns an Iterator of Tuples containing a UUID, a sentence string, an integer, and an optional string, to complete the documentation.
cognee/infrastructure/databases/vector/create_vector_engine.py (1)
42-49:
⚠️ Potential issueFix critical parameter name bug.
There's a typo in the adapter instantiation that will cause runtime errors.
return adapter( - utl=vector_db_url, + url=vector_db_url, api_key=vector_db_key, embedding_engine=embedding_engine, )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if vector_db_provider in supported_databases: adapter = supported_databases[vector_db_provider] return adapter( url=vector_db_url, api_key=vector_db_key, embedding_engine=embedding_engine, )🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/create_vector_engine.py between lines 42 and 49, the parameter name 'utl' used in the adapter instantiation is a typo and should be corrected to 'url' to match the expected parameter name. Update the adapter call to use 'url=vector_db_url' instead of 'utl=vector_db_url' to fix the runtime error caused by this incorrect parameter name.
cognee/tasks/documents/classify_documents.py (1)
54-70: 🛠️ Refactor suggestion
Robustness: broaden the error guard in
update_node_set.
json.loads(document.external_metadata)will also raise aTypeErrorifexternal_metadataisNone– a scenario we occasionally hit when ingesting legacy records.
A tiny tweak makes the helper safer:- except json.JSONDecodeError: + except (json.JSONDecodeError, TypeError):Optional, but prevents silent crashes during bulk imports.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def update_node_set(document): """ Extracts node_set from document's external_metadata. Parses the external_metadata of the given document and updates the document's belongs_to_set attribute with NodeSet objects generated from the node_set found in the external_metadata. If the external_metadata is not valid JSON, is not a dictionary, does not contain the 'node_set' key, or if node_set is not a list, the function has no effect and will return early. Parameters: ----------- - document: The document object which contains external_metadata from which the node_set will be extracted. """ try: metadata = json.loads(document.external_metadata) if not isinstance(metadata, dict): return node_set = metadata.get("node_set") if not isinstance(node_set, list): return document.belongs_to_set = [ NodeSet.from_dict(node) for node in node_set ] - except json.JSONDecodeError: + except (json.JSONDecodeError, TypeError): return🤖 Prompt for AI Agents
In cognee/tasks/documents/classify_documents.py around lines 54 to 70, the current error handling in update_node_set only catches JSONDecodeError when parsing document.external_metadata, but it can also raise a TypeError if external_metadata is None. To fix this, broaden the except clause to catch both JSONDecodeError and TypeError exceptions to prevent silent crashes during bulk imports with legacy records.
cognee/tasks/ingestion/migrate_relational_database.py (1)
100-108:
⚠️ Potential issueForeign-key filter uses the wrong column + identity operator bug.
We want to skip columns in the current table that serve as FKs.
The column resides infk["column"], notfk["ref_column"].
key is primary_key_colcompares object identity; use equality.Patch:
- foreign_keys.append(fk["ref_column"]) + foreign_keys.append(fk["column"]) - if key is primary_key_col or key in foreign_keys: + if key == primary_key_col or key in foreign_keys:This prevents accidental creation of
ColumnValuenodes for FK columns and avoids the brittleiscomparison.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Get foreign key columns to filter them out from column migration foreign_keys = [] for fk in details.get("foreign_keys", []): - foreign_keys.append(fk["ref_column"]) + foreign_keys.append(fk["column"]) for key, value in row_properties.items(): # Skip mapping primary key information to itself and mapping of foreign key information (as it will be mapped bellow) - if key is primary_key_col or key in foreign_keys: + if key == primary_key_col or key in foreign_keys: continue🧰 Tools
🪛 Pylint (3.3.7)
[convention] 106-106: Line too long (141/100)
(C0301)
🤖 Prompt for AI Agents
In cognee/tasks/ingestion/migrate_relational_database.py around lines 100 to 108, the foreign key filtering incorrectly uses fk["ref_column"] instead of fk["column"] to identify foreign key columns in the current table, and it uses the identity operator 'is' to compare key and primary_key_col, which should be an equality check. Fix this by changing foreign_keys to collect fk["column"] values and replace 'key is primary_key_col' with 'key == primary_key_col' to correctly skip primary and foreign key columns.
cognee/tasks/temporal_awareness/index_graphiti_objects.py (1)
34-34: 💡 Verification agent
🧩 Analysis chain
Verify the method name change is consistent across adapters.
The change from
get_model_independent_graph_data()toget_graph_data()appears to align with the updated graph database adapters. Please ensure this method exists in all graph adapters.
🏁 Script executed:
#!/bin/bash # Description: Verify that get_graph_data method exists in all graph adapters # Expected: All adapters should implement get_graph_data method # Search for get_graph_data method implementations ast-grep --pattern $'async def get_graph_data($_) { $$$ }' # Also check for any remaining get_model_independent_graph_data references rg "get_model_independent_graph_data" -A 2Length of output: 2104
Implement get_graph_data in all graph adapters
The adapters still define and invoke
get_model_independent_graph_data()without providingget_graph_data(). Please update the following files to either rename the existing method or add a thin wrapper so that every adapter implementsasync def get_graph_data(...):
- cognee/infrastructure/databases/graph/kuzu/adapter.py
- cognee/infrastructure/databases/graph/neo4j_driver/adapter.py
- cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py
For example, you can replace:
- async def get_model_independent_graph_data(self) -> Dict[str, List[str]]: + async def get_graph_data(self) -> Dict[str, List[str]]: """ ... """ # existing implementationOr add:
async def get_graph_data(self, *args, **kwargs): return await self.get_model_independent_graph_data(*args, **kwargs)so that the call in
cognee/tasks/temporal_awareness/index_graphiti_objects.pyline 34 resolves correctly.🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/kuzu/adapter.py, cognee/infrastructure/databases/graph/neo4j_driver/adapter.py, and cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py, ensure each adapter implements an async method named get_graph_data. This can be done by either renaming the existing get_model_independent_graph_data method to get_graph_data or by adding a new async get_graph_data method that calls and returns the result of get_model_independent_graph_data with the same arguments. This will align all adapters with the call made in cognee/tasks/temporal_awareness/index_graphiti_objects.py line 34.
cognee/infrastructure/llm/tokenizer/Gemini/adapter.py (1)
48-59:
⚠️ Potential issueFix parameter name inconsistency.
The method parameter should be named
tokento match the interface definition, notencoding. This will resolve the static analysis warning and maintain consistency with the base interface.Apply this diff to fix the parameter name:
- def decode_single_token(self, encoding: int): + def decode_single_token(self, token: int): """ Raise NotImplementedError when called, as Gemini tokenizer does not support decoding of tokens. Parameters: ----------- - - encoding (int): The token encoding to decode. + - token (int): The token encoding to decode. """📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def decode_single_token(self, token: int): """ Raise NotImplementedError when called, as Gemini tokenizer does not support decoding of tokens. Parameters: ----------- - token (int): The token encoding to decode. """ # Gemini tokenizer doesn't have the option to decode tokens raise NotImplementedError🧰 Tools
🪛 Pylint (3.3.7)
[warning] 48-48: Parameter 'token' has been renamed to 'encoding' in overriding 'GeminiTokenizer.decode_single_token' method
(W0237)
🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/Gemini/adapter.py around lines 48 to 59, rename the method parameter from 'encoding' to 'token' in the decode_single_token function to match the interface definition and resolve the static analysis warning. This change ensures consistency with the base interface without altering the method's behavior.
cognee/infrastructure/files/storage/StorageManager.py (1)
9-9:
⚠️ Potential issueFix type inconsistency between protocol and implementation.
There's a type mismatch between the
Storageprotocol andStorageManagerimplementation:
- Protocol expects
data: bytes(line 9)- Implementation expects
data: BinaryIO(line 60)- LocalStorage actually accepts
Union[BinaryIO, str]according to the relevant code snippetsThis inconsistency could cause type checking issues and developer confusion.
Consider updating the protocol to match the actual usage:
- def store(self, file_path: str, data: bytes): + def store(self, file_path: str, data: Union[BinaryIO, str]):Or update the implementation to match the protocol if bytes-only is the intended interface.
Also applies to: 60-60
🤖 Prompt for AI Agents
In cognee/infrastructure/files/storage/StorageManager.py at lines 9 and 60, there is a type inconsistency between the Storage protocol and StorageManager implementation regarding the `data` parameter type. The protocol defines `data` as bytes, but the implementation uses BinaryIO, and LocalStorage accepts Union[BinaryIO, str]. To fix this, align the protocol and implementation types by either updating the protocol's `data` parameter to accept the broader type used in the implementation (e.g., BinaryIO or Union[BinaryIO, str]) or restrict the implementation to accept bytes only if that is the intended interface, ensuring consistent type annotations across all related classes.
cognee/infrastructure/llm/tokenizer/Mistral/adapter.py (1)
78-89:
⚠️ Potential issueFix parameter name inconsistency with interface.
The parameter name
encodingdoesn't match the interface expectation oftoken. This could cause issues when the method is called through the interface.Apply this diff to fix the parameter name:
- def decode_single_token(self, encoding: int): + def decode_single_token(self, token: int): """ Attempt to decode a single token, although this functionality is not implemented and raises NotImplementedError. Parameters: ----------- - - encoding (int): The integer representation of the token to decode. + - token (int): The integer representation of the token to decode. """ # Mistral tokenizer doesn't have the option to decode tokens raise NotImplementedError📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def decode_single_token(self, token: int): """ Attempt to decode a single token, although this functionality is not implemented and raises NotImplementedError. Parameters: ----------- - token (int): The integer representation of the token to decode. """ # Mistral tokenizer doesn't have the option to decode tokens raise NotImplementedError🧰 Tools
🪛 Pylint (3.3.7)
[warning] 78-78: Parameter 'token' has been renamed to 'encoding' in overriding 'MistralTokenizer.decode_single_token' method
(W0237)
🤖 Prompt for AI Agents
In cognee/infrastructure/llm/tokenizer/Mistral/adapter.py around lines 78 to 89, the method decode_single_token uses the parameter name 'encoding' which is inconsistent with the interface that expects 'token'. Rename the parameter from 'encoding' to 'token' to match the interface and avoid potential issues when the method is called through the interface.
cognee/api/v1/responses/routers/default_tools.py (1)
1-86: 🛠️ Refactor suggestion
Well-structured tool definitions with room for security improvements.
The tool definitions follow OpenAI function calling standards and provide comprehensive parameter specifications. However, consider adding input validation constraints for security.
Consider these security enhancements:
"search_query": { "type": "string", "description": "The query to search for in the knowledge graph", + "maxLength": 1000, + "pattern": "^[\\w\\s\\-.,!?()]+$" },"text": { "type": "string", "description": "Text content to be converted into a knowledge graph", + "maxLength": 50000 },"graph_model_file": { "type": "string", "description": "Path to a custom graph model file", + "pattern": "^[\\w\\-./]+\\.(json|yaml|yml)$" },These constraints help prevent injection attacks and ensure reasonable input sizes.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.DEFAULT_TOOLS = [ { "type": "function", "name": "search", "description": "Search for information within the knowledge graph", "parameters": { "type": "object", "properties": { "search_query": { "type": "string", "description": "The query to search for in the knowledge graph", "maxLength": 1000, "pattern": "^[\\w\\s\\-.,!?()]+$" }, "search_type": { "type": "string", "description": "Type of search to perform", "enum": [ "INSIGHTS", "CODE", "GRAPH_COMPLETION", "SEMANTIC", "NATURAL_LANGUAGE", ], }, "top_k": { "type": "integer", "description": "Maximum number of results to return", "default": 10, }, "datasets": { "type": "array", "items": {"type": "string"}, "description": "Optional list of dataset names to search within", }, }, "required": ["search_query"], }, }, { "type": "function", "name": "cognify", "description": "Convert text into a knowledge graph or process all added content", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "Text content to be converted into a knowledge graph", "maxLength": 50000 }, "graph_model_name": { "type": "string", "description": "Name of the graph model to use", }, "graph_model_file": { "type": "string", "description": "Path to a custom graph model file", "pattern": "^[\\w\\-./]+\\.(json|yaml|yml)$" }, }, }, }, { "type": "function", "name": "prune", "description": "Remove unnecessary or outdated information from the knowledge graph", "parameters": { "type": "object", "properties": { "prune_strategy": { "type": "string", "enum": ["light", "moderate", "aggressive"], "description": "Strategy for pruning the knowledge graph", "default": "moderate", }, "min_confidence": { "type": "number", "description": "Minimum confidence score to retain (0-1)", "minimum": 0, "maximum": 1, }, "older_than": { "type": "string", "description": "ISO date string - prune nodes older than this date", }, }, }, }, ]🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/default_tools.py within lines 1 to 86, the tool definitions lack input validation constraints that can prevent injection attacks and control input sizes. Add constraints such as maxLength for string fields like "search_query", "text", "graph_model_name", and "graph_model_file"; minLength where appropriate; and maxItems for arrays like "datasets". Also, consider pattern restrictions if applicable to further validate inputs. These additions will enhance security by limiting input size and format.
cognee/api/v1/responses/routers/get_responses_router.py (4)
54-55: 🛠️ Refactor suggestion
Remove hardcoded model override or document the reasoning.
The TODO comment indicates this is temporary, but hardcoding the model to "gpt-4o" regardless of the request parameter could confuse users.
Either implement proper model support or document why this override is necessary:
- # TODO: Support other models (e.g. cognee-v1-openai-gpt-3.5-turbo, etc.) - model = "gpt-4o" + # Currently only gpt-4o is supported for responses API + if model != "gpt-4o": + logger.warning(f"Model {model} not supported, using gpt-4o") + model = "gpt-4o"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Currently only gpt-4o is supported for responses API if model != "gpt-4o": logger.warning(f"Model {model} not supported, using gpt-4o") model = "gpt-4o"🧰 Tools
🪛 Pylint (3.3.7)
[warning] 54-54: TODO: Support other models (e.g. cognee-v1-openai-gpt-3.5-turbo, etc.)
(W0511)
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 54 to 55, the model variable is hardcoded to "gpt-4o" overriding any request parameter, which can confuse users. To fix this, either remove the hardcoded assignment so the model parameter from the request is used as intended, or if the override is necessary temporarily, add a clear comment explaining why this is done and when it will be removed or replaced with proper model support.
43-49:
⚠️ Potential issueFix dangerous default value and improve parameter handling.
Using a mutable default value can lead to unexpected behavior when the list is modified.
Apply this diff to fix the dangerous default:
- async def call_openai_api_for_model( - input_text: str, - model: str, - tools: Optional[List[Dict[str, Any]]] = DEFAULT_TOOLS, - tool_choice: Any = "auto", - temperature: float = 1.0, - ) -> Dict[str, Any]: + async def call_openai_api_for_model( + input_text: str, + model: str, + tools: Optional[List[Dict[str, Any]]] = None, + tool_choice: Any = "auto", + temperature: float = 1.0, + ) -> Dict[str, Any]:Then handle the None case in the function:
+ if tools is None: + tools = DEFAULT_TOOLS📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def call_openai_api_for_model( input_text: str, model: str, tools: Optional[List[Dict[str, Any]]] = None, tool_choice: Any = "auto", temperature: float = 1.0, ) -> Dict[str, Any]: if tools is None: tools = DEFAULT_TOOLS # …rest of function…🧰 Tools
🪛 Pylint (3.3.7)
[warning] 43-43: Dangerous default value DEFAULT_TOOLS (builtins.list) as argument
(W0102)
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 43 to 49, the function call_openai_api_for_model uses a mutable default argument DEFAULT_TOOLS for the parameter tools, which can cause unexpected behavior if the list is modified. To fix this, change the default value of tools to None and inside the function, check if tools is None and if so, assign it to DEFAULT_TOOLS. This prevents sharing the same list instance across function calls and ensures safer parameter handling.
72-75:
⚠️ Potential issueFix dependency injection pattern and remove unused parameter.
The
Dependscall in the default argument should be avoided, and the unuseduserparameter should be handled properly.Apply this diff to fix the dependency injection:
@router.post("/", response_model=ResponseBody) async def create_response( request: ResponseRequest, - user: User = Depends(get_authenticated_user), + user: User = Depends(get_authenticated_user), ) -> ResponseBody:If the user parameter is required for authentication but not used in the function body, add a comment explaining this:
user: User = Depends(get_authenticated_user), ) -> ResponseBody: """ OpenAI-compatible responses endpoint with function calling support """ + # User parameter ensures authentication but is not used in processingCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.11.9)
74-74: Do not perform function call
Dependsin argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable(B008)
🪛 Pylint (3.3.7)
[refactor] 72-72: Too many local variables (18/15)
(R0914)
[warning] 74-74: Unused argument 'user'
(W0613)
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 72 to 75, the function create_response uses Depends in the default argument for the user parameter, which is not recommended, and the user parameter is unused. Remove the user parameter from the function signature if it is not needed, or if it is required for authentication side effects but not used, keep it but add a comment explaining why it is present. Avoid using Depends directly in default arguments and instead use it in the function parameters properly.
114-122: 🛠️ Refactor suggestion
Improve exception handling specificity.
Catching
Exceptionis too broad and may hide important error details.Apply this diff for more specific exception handling:
# Dispatch the function try: function_result = await dispatch_function(tool_call) output_status = "success" - except Exception as e: - logger.exception(f"Error executing function {function_name}: {e}") + except (ValueError, TypeError, KeyError) as e: + logger.exception("Error executing function %s: %s", function_name, e) + function_result = f"Error executing {function_name}: {str(e)}" + output_status = "error" + except Exception as e: + logger.exception("Unexpected error executing function %s: %s", function_name, e) function_result = f"Error executing {function_name}: {str(e)}" output_status = "error" + # Re-raise unexpected errors after logging + raise📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Dispatch the function try: function_result = await dispatch_function(tool_call) output_status = "success" except (ValueError, TypeError, KeyError) as e: logger.exception("Error executing function %s: %s", function_name, e) function_result = f"Error executing {function_name}: {str(e)}" output_status = "error" except Exception as e: logger.exception("Unexpected error executing function %s: %s", function_name, e) function_result = f"Error executing {function_name}: {str(e)}" output_status = "error" # Re-raise unexpected errors after logging raise🧰 Tools
🪛 Pylint (3.3.7)
[warning] 118-118: Catching too general exception Exception
(W0718)
[warning] 119-119: Use lazy % formatting in logging functions
(W1203)
🤖 Prompt for AI Agents
In cognee/api/v1/responses/routers/get_responses_router.py around lines 114 to 122, the current code catches a broad Exception which can obscure specific error types. Refine the exception handling by catching more specific exceptions relevant to dispatch_function, such as asyncio.TimeoutError or any known custom exceptions it may raise. This will improve error clarity and handling precision. Adjust the except blocks accordingly to handle these specific exceptions before a general fallback if necessary.
cognee/infrastructure/databases/exceptions/exceptions.py (1)
69-86: 🛠️ Refactor suggestion
New exception class follows consistent patterns but has a similar issue with base class initialization.
The
NodesetFilterNotSupportedErrorclass is well-documented and serves a clear purpose. However, likeEntityNotFoundError, it doesn't callsuper().__init__(), which bypasses the base class logging and initialization logic.Consider maintaining consistency with the base class pattern. If logging needs to be disabled for certain scenarios, consider adding a parameter to control it:
def __init__( self, message: str = "The nodeset filter is not supported in the current graph database.", name: str = "NodeSetFilterNotSupportedError", status_code=status.HTTP_404_NOT_FOUND, ): - self.message = message - self.name = name - self.status_code = status_code + super().__init__(message, name, status_code, log=False)This approach maintains inheritance patterns while allowing selective logging control.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.class NodesetFilterNotSupportedError(CogneeApiError): """ Raise an exception when a nodeset filter is not supported by the current database. This exception inherits from `CogneeApiError` and is designed to provide information about the specific issue of unsupported nodeset filters in the context of graph databases. """ def __init__( self, message: str = "The nodeset filter is not supported in the current graph database.", name: str = "NodeSetFilterNotSupportedError", status_code=status.HTTP_404_NOT_FOUND, ): super().__init__(message, name, status_code, log=False)🧰 Tools
🪛 Pylint (3.3.7)
[warning] 78-78: init method from base class 'CogneeApiError' is not called
(W0231)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/exceptions/exceptions.py around lines 69 to 86, the NodesetFilterNotSupportedError class does not call super().__init__(), which skips the base class initialization and logging logic. To fix this, modify the __init__ method to call super().__init__() with the appropriate parameters, optionally adding a parameter to control logging if needed, ensuring consistent inheritance and proper base class setup.
cognee/infrastructure/databases/graph/graph_db_interface.py (2)
93-96: 🛠️ Refactor suggestion
Log level & broad exception handling might hide production errors
Downgrading all exceptions to
debugrisks silently swallowing issues that corrupt the relationship ledger. Consider:
- Logging at
warning(orerror) when persistence fails.- Narrowing the
exceptclause (Exception→ expected DB errors).- Surfacing failures to callers (optional retry / circuit-breaker).
Keeping visibility high is critical for audit trails.
Also applies to: 120-121
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/graph_db_interface.py around lines 93 to 96, the current exception handling logs all errors at debug level and catches broad exceptions, which may hide critical issues. Change the log level to warning or error to ensure visibility of persistence failures, narrow the except clause to catch only expected database-related exceptions, and consider re-raising the exception or implementing a retry mechanism to surface failures to callers. Apply similar changes to lines 120-121 for consistency.
353-366:
⚠️ Potential issueReturn-type uses
intIDs while the rest of the interface usesstr
Node,EdgeData, and concrete adapters (e.g.,kuzu,neo4j) all employstrfor node IDs. The signature below introducesint, breaking static typing and causing mypy / IDE warnings.- ) -> Tuple[List[Tuple[int, dict]], List[Tuple[int, int, str, dict]]]: + ) -> Tuple[List[Tuple[str, dict]], List[Tuple[str, str, str, dict]]]:Please align the type hints and docstring accordingly, or justify the divergent type with explicit conversion logic.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.@abstractmethod async def get_nodeset_subgraph( self, node_type: Type[Any], node_name: List[str] ) -> Tuple[List[Tuple[str, dict]], List[Tuple[str, str, str, dict]]]: """ Fetch a subgraph consisting of a specific set of nodes and their relationships. Parameters: ----------- - node_type (Type[Any]): The type of nodes to include in the subgraph. - node_name (List[str]): A list of names of the nodes to include in the subgraph. """ raise NotImplementedError🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/graph_db_interface.py around lines 353 to 366, the method get_nodeset_subgraph uses int for node IDs in its return type, while the rest of the interface and implementations use str for node IDs. To fix this, change the return type hints from int to str for node IDs in both the node tuples and edge tuples. Also update the docstring to reflect that node IDs are strings, ensuring consistency with the rest of the interface and avoiding static typing conflicts.
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
79-87: 🛠️ Refactor suggestion
Adapter instantiation may omit mandatory parameters
The generic branch instantiates adapters with only
url/username/password. Adapters such asKuzuAdapter(requiresdb_path) or future adapters needingport/file_pathwill break:adapter = supported_databases[graph_database_provider] return adapter( # missing graph_file_path / port graph_database_url=graph_database_url, graph_database_username=graph_database_username, graph_database_password=graph_database_password, )Consider forwarding all kwargs or registering lambdas/partial objects in
supported_databasesthat satisfy each constructor.🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/get_graph_engine.py around lines 79 to 87, the adapter instantiation only passes url, username, and password, which breaks adapters requiring additional parameters like db_path or port. To fix this, modify the code to forward all relevant keyword arguments (kwargs) to the adapter constructor or update supported_databases to store factory functions (e.g., lambdas or partials) that supply the correct parameters for each adapter type, ensuring all mandatory parameters are provided during instantiation.
cognee/modules/retrieval/graph_completion_cot_retriever.py (2)
74-80:
⚠️ Potential issueReturn type becomes a nested list
answeris already a list (initialised at L79 and assigned fromgenerate_completion).
Wrapping it again (return [answer]) yieldsList[List[str]], which is unlikely what callers expect and breaks type hints.- return [answer] + return answerAlso applies to: 125-125
🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_cot_retriever.py around lines 74 to 80 and line 125, the variable 'answer' is already a list of strings, but the code wraps it again in another list when returning, causing the return type to be a nested list (List[List[str]]). To fix this, remove the extra list wrapping in the return statement so that 'answer' is returned directly as a List[str], matching the expected type hints and avoiding type errors.
81-88:
⚠️ Potential issueOff-by-one:
max_iter + 1executes one extra round
range(max_iter + 1)performs max_iter + 1 iterations while the docstring says “maximum number of iterations … (default 4)”.
Either make the looprange(max_iter)or clarify the docstring.- for round_idx in range(max_iter + 1): + for round_idx in range(max_iter):Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_cot_retriever.py around lines 81 to 88, the loop uses range(max_iter + 1) which causes one extra iteration beyond the intended maximum. To fix this, change the loop to range(max_iter) to ensure it runs exactly max_iter times as described in the docstring, or alternatively update the docstring to reflect the current behavior if the extra iteration is intentional.
cognee/modules/retrieval/graph_completion_context_extension_retriever.py (2)
1-8:
⚠️ Potential issueRemove unused imports to satisfy Ruff/Pylint and avoid dead code
get_llm_client,read_query_prompt, andrender_promptare imported but never referenced.
Keeping stale imports bloats byte-code, hurts import-time perf, and now fails Ruff (F401) & Pylint (W0611).-from cognee.infrastructure.llm.get_llm_client import get_llm_client -from cognee.infrastructure.llm.prompts import read_query_prompt, render_prompt📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from typing import Any, Optional, List, Type from cognee.shared.logging_utils import get_logger from cognee.modules.retrieval.graph_completion_retriever import GraphCompletionRetriever from cognee.modules.retrieval.utils.completion import generate_completion logger = get_logger()🧰 Tools
🪛 Ruff (0.11.9)
3-3:
cognee.infrastructure.llm.get_llm_client.get_llm_clientimported but unusedRemove unused import:
cognee.infrastructure.llm.get_llm_client.get_llm_client(F401)
6-6:
cognee.infrastructure.llm.prompts.read_query_promptimported but unusedRemove unused import
(F401)
6-6:
cognee.infrastructure.llm.prompts.render_promptimported but unusedRemove unused import
(F401)
🪛 Pylint (3.3.7)
[warning] 3-3: Unused get_llm_client imported from cognee.infrastructure.llm.get_llm_client
(W0611)
[warning] 6-6: Unused read_query_prompt imported from cognee.infrastructure.llm.prompts
(W0611)
[warning] 6-6: Unused render_prompt imported from cognee.infrastructure.llm.prompts
(W0611)
🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_context_extension_retriever.py lines 1 to 8, remove the unused imports get_llm_client, read_query_prompt, and render_prompt since they are not referenced anywhere in the file. This will satisfy Ruff and Pylint by eliminating dead code and improve import-time performance.
110-117: 🛠️ Refactor suggestion
Return signature mismatch – should return a string, not a single-item list
Callers of
get_completionwill likely expect the completion text, yet the method wraps it in a list (return [answer]).
Unless a higher-level protocol mandates a list, simplify:- return [answer] + return answerAlign the declared return type accordingly (
-> str).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.answer = await generate_completion( query=query, context=context, user_prompt_path=self.user_prompt_path, system_prompt_path=self.system_prompt_path, ) return answer🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_context_extension_retriever.py around lines 110 to 117, the method get_completion currently returns a single-item list containing the answer, but it should return just the string answer itself. Remove the list brackets around answer in the return statement so it returns a string, and update the method's return type annotation to -> str to reflect this change.
cognee/modules/pipelines/operations/pipeline.py (2)
113-135: 🛠️ Refactor suggestion
“Ugly hack” block risks data inconsistency & merits extraction
The manual
uuid5reconstruction and doublelog_pipeline_run_initiatedcalls (lines 116-134) suggest a workaround for stale status logic.Problems
• Divergentdataset_idvsdataset.id(reset on 135) may desynchronise logs.
• Hard-coded pipeline names hinder future refactors.Recommendation
Move this logic intolog_pipeline_run_initiatedor a dedicated helper that returns the consistentdataset_idused throughout the call, eliminating the re-assignment surprise.🧰 Tools
🪛 Pylint (3.3.7)
[convention] 118-118: Line too long (112/100)
(C0301)
[convention] 128-128: Line too long (116/100)
(C0301)
🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/pipeline.py around lines 113 to 135, the current code manually reconstructs dataset_id using uuid5 and makes two separate calls to log_pipeline_run_initiated with hard-coded pipeline names, then resets dataset_id to dataset.id, causing potential data inconsistency and maintenance issues. Refactor by extracting this logic into a dedicated helper function or incorporate it into log_pipeline_run_initiated so that it consistently computes and returns the dataset_id used in all calls, avoiding the manual reassignment and hard-coded strings, ensuring consistent dataset_id usage and easier future refactoring.
64-92: 🛠️ Refactor suggestion
Dataset resolution loop is O(n²) – convert lookup to hash-table
For each requested name you iterate over every existing dataset (nested loops).
With large tenant datasets this scales poorly.Optimised sketch:
- existing_datasets = await get_datasets(user.id) - ... - for dataset_name in datasets: - for existing_dataset in existing_datasets: + existing_by_name = {d.name: d for d in existing_datasets} + existing_by_id = {str(d.id): d for d in existing_datasets} + for dataset_name in datasets: + existing_dataset = existing_by_name.get(dataset_name) or existing_by_id.get(dataset_name) + if existing_dataset: + dataset_instances.append(existing_dataset) + continueThis trims complexity to O(n).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/pipeline.py between lines 64 and 92, the current dataset resolution uses nested loops causing O(n²) complexity. To fix this, create a dictionary (hash table) mapping existing dataset names and IDs to their instances before the loop. Then, for each dataset_name, directly check this dictionary for existence to append the existing dataset or create a new one if not found. This change reduces the complexity to O(n).
cognee/infrastructure/llm/openai/adapter.py (1)
18-21:
⚠️ Potential issue
observemay beNone– provide a safe no-op fallback decorator
get_observe()can legitimately returnNonewhen no monitoring tool is configured.
Applying@Noneas a decorator raises aTypeErrorduring module import, breaking every runtime path that imports this adapter.-from cognee.modules.observability.get_observe import get_observe - -observe = get_observe() +from cognee.modules.observability.get_observe import get_observe + +def _noop(func): + return func + +# Use the real decorator if available; otherwise fall back to a no-op +observe = get_observe() or (lambda *_, **__: _noop)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from cognee.modules.observability.get_observe import get_observe def _noop(func): return func # Use the real decorator if available; otherwise fall back to a no-op observe = get_observe() or (lambda *_, **__: _noop)🤖 Prompt for AI Agents
In cognee/infrastructure/llm/openai/adapter.py around lines 18 to 21, the variable observe assigned from get_observe() may be None, which causes a TypeError if used as a decorator. To fix this, check if observe is None and if so, assign it a no-op decorator function that simply returns the original function unchanged. This ensures that applying @observe will not fail even when no monitoring tool is configured.
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (1)
174-183:
⚠️ Potential issueManual runner invokes non-existent methods – will raise
AttributeError
test.test_graph_completion_context_simple()(etc.) does not exist – the defined
method names include_cot_. The manual block is unused by pytest but will fail
for anyone invoking the file directly.Either:
• rename the calls to the correct method names, or
• remove theif __name__ == "__main__":block entirely.🧰 Tools
🪛 Pylint (3.3.7)
[error] 179-179: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_simple' member
(E1101)
[error] 180-180: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_complex' member
(E1101)
[error] 181-181: Instance of 'TestGraphCompletionRetriever' has no 'test_get_graph_completion_context_on_empty_graph' member
(E1101)
🤖 Prompt for AI Agents
In cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py around lines 174 to 183, the manual test runner calls methods that do not exist because the actual test method names include '_cot_'. To fix this, either rename the calls in the manual runner to match the correct method names with '_cot_' or remove the entire manual runner block to prevent AttributeError when running the file directly.
cognee/eval_framework/modal_run_eval.py (1)
80-97:
⚠️ Potential issue
html_outputmay be undefined and files are opened without encoding
html_outputis only set inside theif eval_params.get("dashboard")block yet
written unconditionally – raisesUnboundLocalErrorwhendashboardis falsy.open()defaults to platform encoding; specifyencoding="utf-8"for
deterministic behaviour.- with open("/data/" + answers_filename, "w") as f: + with open("/data/" + answers_filename, "w", encoding="utf-8") as f: json.dump(answers, f, ensure_ascii=False, indent=4) @@ - if eval_params.get("dashboard"): + if eval_params.get("dashboard"): ... + with open("/data/" + html_filename, "w", encoding="utf-8") as f: + f.write(html_output) + vol.commit()Also place the second
open()inside theifblock to avoid the undefined-variable issue.Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Pylint (3.3.7)
[warning] 80-80: Using open without explicitly specifying an encoding
(W1514)
[warning] 93-93: Using open without explicitly specifying an encoding
(W1514)
[error] 94-94: Possibly using variable 'html_output' before assignment
(E0606)
🤖 Prompt for AI Agents
In cognee/eval_framework/modal_run_eval.py lines 80 to 97, the variable html_output is assigned only inside the if eval_params.get("dashboard") block but used outside it, causing an UnboundLocalError if the condition is false. Also, the open() calls lack explicit encoding, which can lead to inconsistent behavior across platforms. To fix this, move the second open() call that writes html_output inside the if block and add encoding="utf-8" to both open() calls to ensure consistent file encoding.
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (2)
620-620:
⚠️ Potential issueFix method name typo.
The static analysis correctly identifies that
get_neighboursshould beget_neighborsto match the actual method name defined in this class.- return await self.get_neighbours(node_id) + return await self.get_neighbors(node_id)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.return await self.get_neighbors(node_id)🧰 Tools
🪛 Pylint (3.3.7)
[error] 620-620: Instance of 'Neo4jAdapter' has no 'get_neighbours' member; maybe 'get_neighbors'?
(E1101)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/neo4j_driver/adapter.py at line 620, the method name called is get_neighbours, which is a typo. Change this to get_neighbors to match the actual method name defined in the class and ensure the call works correctly.
778-778: 🛠️ Refactor suggestion
Fix dangerous default mutable argument.
Using
dict()as a default argument is dangerous because the same dictionary instance is shared across all function calls, potentially causing unexpected side effects.- def serialize_properties(self, properties=dict()): + def serialize_properties(self, properties=None): """ Convert properties of a node or edge into a serializable format suitable for storage. Parameters: ----------- - properties: A dictionary of properties to serialize, defaults to an empty dictionary. (default dict()) Returns: -------- A dictionary with serialized property values. """ + if properties is None: + properties = {} serialized_properties = {}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def serialize_properties(self, properties=None): """ Convert properties of a node or edge into a serializable format suitable for storage. Parameters: ----------- - properties: A dictionary of properties to serialize, defaults to an empty dictionary. (default dict()) Returns: -------- A dictionary with serialized property values. """ if properties is None: properties = {} serialized_properties = {}🧰 Tools
🪛 Ruff (0.11.9)
778-778: Do not use mutable data structures for argument defaults
Replace with
None; initialize within function(B006)
🪛 Pylint (3.3.7)
[warning] 778-778: Dangerous default value dict() (builtins.dict) as argument
(W0102)
[refactor] 778-778: Consider using '{}' instead of a call to 'dict'.
(R1735)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/neo4j_driver/adapter.py at line 778, the method serialize_properties uses a mutable default argument dict(), which can lead to shared state across calls. Change the default argument to None and inside the method initialize properties to an empty dictionary if it is None to avoid this issue.
cognee/tests/unit/modules/retrieval/graph_completion_retriever_context_extension_test.py (2)
176-185:
⚠️ Potential issue
__main__runner is broken and will raiseAttributeErrorThe manual runner calls non-existent method names (
test_graph_completion_context_simplevs the actualtest_graph_completion_extension_context_simple, etc.).
Besides being incorrect, the block is redundant becausepytestis the authoritative test runner.-if __name__ == "__main__": - from asyncio import run - - test = TestGraphCompletionRetriever() - - async def main(): - await test.test_graph_completion_context_simple() - await test.test_graph_completion_context_complex() - await test.test_get_graph_completion_context_on_empty_graph() - - run(main()) +# The explicit asyncio runner is unnecessary; invoke `pytest` instead. +# If you want manual execution, ensure the method names match exactly: +# await test.test_graph_completion_extension_context_simple() +# but ideally delete this whole block.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# The explicit asyncio runner is unnecessary; invoke `pytest` instead. # If you really need a manual entrypoint, make sure the test methods exist: # await test.test_graph_completion_extension_context_simple() # but in most cases it’s best to delete this whole block and let pytest do its job.🧰 Tools
🪛 Pylint (3.3.7)
[error] 181-181: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_simple' member
(E1101)
[error] 182-182: Instance of 'TestGraphCompletionRetriever' has no 'test_graph_completion_context_complex' member
(E1101)
[error] 183-183: Instance of 'TestGraphCompletionRetriever' has no 'test_get_graph_completion_context_on_empty_graph' member
(E1101)
🤖 Prompt for AI Agents
In cognee/tests/unit/modules/retrieval/graph_completion_retriever_context_extension_test.py around lines 176 to 185, the __main__ runner calls test methods with incorrect names that do not exist, causing AttributeError. Remove this manual runner block entirely since it is redundant and pytest should be used as the test runner instead.
18-30: 🛠️ Refactor suggestion
Use tmp-based directories to prevent cross-test interference
All three tests write to hard-coded
.cognee_system/.data_storagesub-folders under the test directory.
When tests are executed in parallel (pytest -xdist, CI matrix, etc.) they will race on the same folders, causing flaky failures and sporadicDatabaseAlreadyExists/permission errors. Prefer thetmp_path/tmp_path_factoryfixtures (ortempfile.TemporaryDirectory) so each test gets an isolated workspace.- system_directory_path = os.path.join( - pathlib.Path(__file__).parent, ".cognee_system/test_graph_context" - ) + system_directory_path = tmp_path_factory.mktemp("cognee_system")(Apply to every path that is currently statically concatenated.)
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/tests/unit/modules/retrieval/graph_completion_retriever_context_extension_test.py around lines 18 to 30, the test uses hard-coded directories for system and data storage which can cause race conditions and flaky failures when tests run in parallel. Modify the code to use pytest's tmp_path or tmp_path_factory fixtures to create temporary, unique directories for system_root_directory and data_root_directory instead of static paths. This ensures each test runs in an isolated workspace and prevents interference.
cognee/tasks/repo_processor/get_local_dependencies.py (1)
233-234: 🛠️ Refactor suggestion
Mutable default argument can lead to cross-call leakage
existing_nodes: list[DataPoint] = {}uses a shared dict between invocations.- tree_root: Node, script_path: str, existing_nodes: list[DataPoint] = {} + tree_root: Node, + script_path: str, + existing_nodes: Optional[dict[str, DataPoint]] = None, ): @@ - for child_node in tree_root.children: + if existing_nodes is None: + existing_nodes = {} + + for child_node in tree_root.children:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.async def get_local_dependencies( tree_root: Node, script_path: str, existing_nodes: Optional[dict[str, DataPoint]] = None, ) -> AsyncGenerator[DataPoint, None]: if existing_nodes is None: existing_nodes = {} for child_node in tree_root.children: …🧰 Tools
🪛 Ruff (0.11.9)
233-233: Do not use mutable data structures for argument defaults
Replace with
None; initialize within function(B006)
🤖 Prompt for AI Agents
In cognee/tasks/repo_processor/get_local_dependencies.py around lines 233 to 234, the function uses a mutable default argument existing_nodes set to an empty dictionary, which can cause data to persist across calls unexpectedly. Change the default value of existing_nodes to None and inside the function initialize it to an empty list if it is None to avoid cross-call data leakage.
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1)
68-77: 🛠️ Refactor suggestion
Silently swallowing
EntityNotFoundErrorhides real problemsA bare
try / except … : passwill mask unexpected states and makes debugging difficult.
Usingcontextlib.suppressconveys intent better and keeps the block readable, but logging once is still valuable.+import contextlib ... - try: - await memory_fragment.project_graph_from_db( - graph_engine, - node_properties_to_project=properties_to_project, - edge_properties_to_project=["relationship_name"], - node_type=node_type, - node_name=node_name, - ) - except EntityNotFoundError: - pass + with contextlib.suppress(EntityNotFoundError): + await memory_fragment.project_graph_from_db( + graph_engine, + node_properties_to_project=properties_to_project, + edge_properties_to_project=["relationship_name"], + node_type=node_type, + node_name=node_name, + )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# at the top of the file import contextlib # … later in your async function … with contextlib.suppress(EntityNotFoundError): await memory_fragment.project_graph_from_db( graph_engine, node_properties_to_project=properties_to_project, edge_properties_to_project=["relationship_name"], node_type=node_type, node_name=node_name, )🧰 Tools
🪛 Ruff (0.11.9)
68-77: Use
contextlib.suppress(EntityNotFoundError)instead oftry-except-passReplace with
contextlib.suppress(EntityNotFoundError)(SIM105)
🤖 Prompt for AI Agents
In cognee/modules/retrieval/utils/brute_force_triplet_search.py around lines 68 to 77, the current try-except block silently swallows EntityNotFoundError, which can hide real issues. Replace the bare except-pass with contextlib.suppress(EntityNotFoundError) to clearly convey intent, and add a single log statement to record when this exception occurs, improving readability and aiding debugging without interrupting flow.
cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
536-536: 💡 Verification agent
🧩 Analysis chain
Verify the existence of list_collections method.
The code calls
self.list_collections()but this method is not defined in the class. This could cause a runtime error.
🏁 Script executed:
#!/bin/bash # Description: Search for list_collections method definition in ChromaDBAdapter ast-grep --pattern $'class ChromaDBAdapter { $$$ def list_collections($$$) { $$$ } $$$ }'Length of output: 102
🏁 Script executed:
#!/bin/bash # Search for the definition of list_collections in the adapter file rg -n "def list_collections" cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py # Search for any references to list_collections across the entire repository rg -n "list_collections" .Length of output: 581
🏁 Script executed:
#!/bin/bash # Locate class definition and inspect its base classes rg -n "class ChromaDBAdapter" -n cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py # Show the first 200 lines of the file to see imports and class signature sed -n '1,200p' cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.pyLength of output: 6703
🏁 Script executed:
#!/bin/bash # Locate the file containing VectorDBInterface definition rg -l "class VectorDBInterface" -n . # Once located, display its contents to check for list_collections definition # Replace <path> with the actual path from the previous command sed -n '1,200p' cognee/infrastructure/databases/vector/vector_db_interface.py # Search within the interface for any mention of list_collections rg -n "list_collections" cognee/infrastructure/databases/vector/vector_db_interface.pyLength of output: 4946
Fix missing
list_collectionsimplementation in ChromaDBAdapterIt appears
ChromaDBAdaptercallsself.list_collections()on line 536, but neither the adapter norVectorDBInterfacedefines this method—this will lead to an AttributeError at runtime.• File: cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py
• Line: 536Suggested remedy: invoke the ChromaDB client’s
list_collectionsmethod instead, for example:- collections = await self.list_collections() + conn = await self.get_connection() + collections = await conn.list_collections()Or, if you intend to keep
self.list_collections(), add and implement that method in the adapter to delegate to the client.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.conn = await self.get_connection() collections = await conn.list_collections()🧰 Tools
🪛 Pylint (3.3.7)
[error] 536-536: Instance of 'ChromaDBAdapter' has no 'list_collections' member
(E1101)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py at line 536, the code calls self.list_collections() but this method is not defined in the class or its base classes, causing a runtime error. To fix this, replace the call to self.list_collections() with a direct call to the ChromaDB client’s list_collections method, or alternatively, define a list_collections method in ChromaDBAdapter that delegates to the client’s list_collections method.
cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py (2)
759-759:
⚠️ Potential issueFix method call with missing argument.
The call to
delete_data_pointsis missing the requiredcollection_nameargument and should be awaited.- self.delete_data_points(data_point_ids) + await self.delete_data_points(collection_name, data_point_ids)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.- self.delete_data_points(data_point_ids) + await self.delete_data_points(collection_name, data_point_ids)🧰 Tools
🪛 Pylint (3.3.7)
[error] 759-759: No value for argument 'data_point_ids' in method call
(E1120)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py at line 759, the method call to delete_data_points is missing the required collection_name argument and is not awaited. Fix this by adding the appropriate collection_name argument to the call and prefixing the call with await to properly handle the asynchronous operation.
745-745:
⚠️ Potential issueFix method call with missing argument.
The call to
delete_data_pointsis missing the requireddata_point_idsargument.- return await self.delete_data_points([data_point_id]) + return await self.delete_data_points(collection_name, [data_point_id])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.return await self.delete_data_points(collection_name, [data_point_id])🧰 Tools
🪛 Pylint (3.3.7)
[error] 745-745: No value for argument 'data_point_ids' in method call
(E1120)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/hybrid/falkordb/FalkorDBAdapter.py at line 745, the call to delete_data_points is missing the required argument data_point_ids. Fix this by passing the data_point_id inside a list as the argument to delete_data_points, ensuring the method receives the expected parameter.
cognee/infrastructure/databases/graph/networkx/adapter.py (1)
87-98: 💡 Verification agent
🧩 Analysis chain
Verify method signature consistency with base interface.
Static analysis indicates that the
add_nodemethod parameter count differs from the base interface (3 parameters expected vs 2 provided). Please ensure this method signature matches theGraphDBInterfacedefinition.
🏁 Script executed:
#!/bin/bash # Verify the add_node method signature in the base interface ast-grep --pattern 'async def add_node($$$)'Length of output: 14746
Align add_node signature with GraphDBInterface
The
GraphDBInterfacedefines:async def add_node(self, node_id: str, properties: Dict[str, Any]) -> Nonebut in
networkx/adapter.py(and other adapters) we have:async def add_node(self, node: DataPoint) -> NoneThese signatures don’t match and will break the polymorphic contract. Please choose one of the following approaches and apply it consistently across the interface and all adapters:
- Update the interface to accept a
DataPoint:--- graph_db_interface.py:166 - async def add_node(self, node_id: str, properties: Dict[str, Any]) -> None: + async def add_node(self, node: DataPoint) -> None:- Or update all adapters to split the
DataPointintonode_idandproperties:--- networkx/adapter.py:87 - async def add_node(self, node: DataPoint) -> None: + async def add_node(self, node_id: str, properties: Dict[str, Any]) -> None: self.graph.add_node(node_id, **properties) await self.save_graph_to_file(self.filename)• Files needing updates:
- cognee/infrastructure/databases/graph/graph_db_interface.py: 166
- cognee/infrastructure/databases/graph/networkx/adapter.py: 87
- (And the other adapters in
falkordb,neo4j_driver,memgraph,kuzu)🧰 Tools
🪛 Pylint (3.3.7)
[warning] 87-87: Number of parameters was 3 in 'GraphDBInterface.add_node' and is now 2 in overriding 'NetworkXAdapter.add_node' method
(W0221)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/networkx/adapter.py lines 87 to 98, the add_node method signature uses a single DataPoint parameter, which conflicts with the GraphDBInterface definition expecting two parameters: node_id (str) and properties (Dict[str, Any]). To fix this, refactor the add_node method to accept node_id and properties separately, then update the method body to add the node using these parameters. Also, ensure this change is applied consistently across all adapter implementations and the interface to maintain polymorphic compatibility.
cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py (1)
274-276:
⚠️ Potential issue
awaitmissing – points are never uploaded
AsyncQdrantClient.upload_pointsreturns a coroutine.
Withoutawait, the upload is skipped and a warning is raised at runtime.- client.upload_points(collection_name=collection_name, points=points) + await client.upload_points(collection_name=collection_name, points=points)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.try: await client.upload_points(collection_name=collection_name, points=points) except UnexpectedResponse as error:🤖 Prompt for AI Agents
In cognee/infrastructure/databases/vector/qdrant/QDrantAdapter.py around lines 274 to 276, the call to client.upload_points is missing an await keyword, causing the coroutine to not execute and points not to be uploaded. Add the await keyword before client.upload_points to properly await the asynchronous upload operation and ensure points are uploaded as intended.
cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (10)
178-195: 🛠️ Refactor suggestion
add_nodesbatch query has two breaking issues
n:node.label(and the analogue in theON MATCHclause) suffers from the same dynamic-label problem as above.- The unwind variable is named
node, yet the property access in Cypher usesnode.labelwhile the Python dictionary key is"label".A minimal fix that keeps the single query could be:
UNWIND $nodes AS node_data -MERGE (n {id: node.node_id}) -ON CREATE SET n:node.label, n += node.properties, n.updated_at = timestamp() -ON MATCH SET n:node.label, n += node.properties, n.updated_at = timestamp() +MERGE (n:{label} {id: node_data.node_id}) +ON CREATE SET n += node_data.properties, n.updated_at = timestamp() +ON MATCH SET n += node_data.properties, n.updated_at = timestamp() RETURN ID(n) AS internal_id, n.id AS nodeIdwith
query = query.format(label="{label}") # or build one query per distinct labelOtherwise split the list by label and run one query per label.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 195-195: Consider using '{"nodes": nodes}' instead of a call to 'dict'.
(R1735)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 178 to 195, the Cypher query incorrectly uses a dynamic label with n:node.label which is not valid, and the unwind variable 'node' conflicts with the property access node.label. To fix this, modify the query to use Python string formatting to inject the label dynamically, for example by formatting the query with the label placeholder replaced by the actual label string. Alternatively, split the nodes list by label and run separate queries per label to avoid dynamic label issues. Adjust the query and the call to self.query accordingly to ensure the label is correctly applied in the Cypher MERGE statement.
255-260:
⚠️ Potential issueMalformed Cypher in
delete_node
MATCH (node: {{id: $node_id}})contains an extra colon and double braces, leading to a syntax error. Property-maps don’t use:after the variable and don’t need sanitising.-sanitized_id = node_id.replace(":", "_") -query = "MATCH (node: {{id: $node_id}}) DETACH DELETE node" -params = {"node_id": sanitized_id} +query = "MATCH (node {id: $node_id}) DETACH DELETE node" +params = {"node_id": node_id}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.query = "MATCH (node {id: $node_id}) DETACH DELETE node" params = {"node_id": node_id} return await self.query(query, params)🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 255 to 260, the Cypher query in delete_node is malformed due to an extra colon and double braces in MATCH (node: {{id: $node_id}}). Remove the colon after node and replace double braces with single braces to correctly specify the property map as MATCH (node {id: $node_id}). Also, remove the sanitization of node_id since it is unnecessary for property values.
697-705:
⚠️ Potential issueSyntax issues in
remove_connection_to_predecessors_of
MATCH (node {id: nid})is missing$beforenid; additionally, the curly braces afterUNWINDshould refer to the alias without extra braces.UNWIND $node_ids AS nid -MATCH (node {id: nid})-[r]->(predecessor) +MATCH (node {id: nid})-[r]->(predecessor) WHERE type(r) = $edge_label DELETE r(The first line is fine; it is the second that needs parameterization.)
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 697 to 705, the Cypher query has syntax issues: the MATCH clause should use parameterized syntax with `$nid` instead of `nid` without the dollar sign, and the curly braces after UNWIND should not enclose the alias with extra braces. Fix the MATCH line to use `(node {id: $nid})` and ensure the UNWIND line correctly references the parameter without unnecessary braces.
330-350:
⚠️ Potential issue
has_edgesmixes internal ids with business idsThe Cypher uses
id(a)/id(b)(database-internal ids) whereas the supplied parameters are the application ids (edge.from_node,str(UUID)). The check always fails unless internal and business ids coincide.Fix by matching on the
idproperty instead:-MATCH (a)-[r]->(b) -WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name +MATCH (a {id: edge.from_node})-[r]->(b {id: edge.to_node}) +WHERE type(r) = edge.relationship_name📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.query = """ UNWIND $edges AS edge MATCH (a {id: edge.from_node})-[r]->(b {id: edge.to_node}) WHERE type(r) = edge.relationship_name RETURN edge.from_node AS from_node, edge.to_node AS to_node, edge.relationship_name AS relationship_name, count(r) > 0 AS edge_exists """ try: params = { "edges": [ { "from_node": str(edge[0]), "to_node": str(edge[1]), "relationship_name": edge[2], } for edge in edges ], } results = await self.query(query, params) return [result["edge_exists"] for result in results]🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 330 to 350, the Cypher query incorrectly matches nodes using internal database ids with id(a) and id(b), while the parameters use application-level ids as strings. To fix this, change the query to match nodes by their 'id' property (e.g., a.id = edge.from_node and b.id = edge.to_node) instead of using the internal id() function, ensuring the query compares the correct identifiers.
750-777: 🛠️ Refactor suggestion
Dangerous mutable default argument
def serialize_properties(self, properties=dict()):binds a single dict shared by every call. UseNoneand initialise inside.- def serialize_properties(self, properties=dict()): + def serialize_properties(self, properties: dict | None = None): ... - for property_key, property_value in properties.items(): + for property_key, property_value in (properties or {}).items():📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def serialize_properties(self, properties: dict | None = None): """ Convert property values to a suitable representation for storage. Parameters: ----------- - properties: A dictionary of properties to serialize. (default dict()) Returns: -------- A dictionary of serialized properties. """ serialized_properties = {} for property_key, property_value in (properties or {}).items(): if isinstance(property_value, UUID): serialized_properties[property_key] = str(property_value) continue if isinstance(property_value, dict): serialized_properties[property_key] = json.dumps(property_value, cls=JSONEncoder) continue serialized_properties[property_key] = property_value return serialized_properties🧰 Tools
🪛 Ruff (0.11.9)
750-750: Do not use mutable data structures for argument defaults
Replace with
None; initialize within function(B006)
🪛 Pylint (3.3.7)
[warning] 750-750: Dangerous default value dict() (builtins.dict) as argument
(W0102)
[refactor] 750-750: Consider using '{}' instead of a call to 'dict'.
(R1735)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 750 to 777, the method serialize_properties uses a mutable default argument (properties=dict()), which can lead to unexpected behavior due to shared state across calls. Change the default value of properties to None and inside the method, initialize it to an empty dictionary if it is None before proceeding with serialization.
65-70: 🛠️ Refactor suggestion
Add explicit driver-shutdown hook
AsyncGraphDatabase.driver()opens TCP connections that remain open for the entire process lifetime. The class currently never callsawait self.driver.close()which will eventually leak sockets (esp. in test-suites that spin many adapters).class MemgraphAdapter(GraphDBInterface): ... def __init__(...): ... self.driver = driver or AsyncGraphDatabase.driver( graph_database_url, auth=(graph_database_username, graph_database_password), max_connection_lifetime=120, ) + + async def close(self) -> None: + """ + Gracefully dispose the underlying Neo4j/Memgraph driver. + """ + if self.driver: + await self.driver.close()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.self.driver = driver or AsyncGraphDatabase.driver( graph_database_url, auth=(graph_database_username, graph_database_password), max_connection_lifetime=120, ) async def close(self) -> None: """ Gracefully dispose the underlying Neo4j/Memgraph driver. """ if self.driver: await self.driver.close()🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 65 to 70, the AsyncGraphDatabase.driver() creates TCP connections that stay open indefinitely because the driver is never explicitly closed. To fix this, add an asynchronous shutdown method in the class that calls await self.driver.close() to properly close the connections when the adapter is no longer needed, preventing socket leaks especially during tests.
651-676: 🛠️ Refactor suggestion
get_connectionsbuilds tuples from the wrong objectThe loop variable
neighbourshadows the record resulting in type errors; moreover it again indexes into the relationship object. Use the fields returned by the query:for record in predecessors: - neighbour = neighbour["relation"] - connections.append((neighbour[0], {"relationship_name": neighbour[1]}, neighbour[2])) + rel = record["relation"] + connections.append( + (record["neighbour"]["id"], {"relationship_name": type(rel).__name__}, record["node"]["id"]) + )Do the analogous change for the
successorsloop.Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 663-663: Consider using '{"node_id": str(node_id)}' instead of a call to 'dict'.
(R1735)
[refactor] 664-664: Consider using '{"node_id": str(node_id)}' instead of a call to 'dict'.
(R1735)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py lines 651 to 676, the variable 'neighbour' in the loops shadows the record and incorrectly indexes into the relationship object. To fix this, rename the loop variable to avoid shadowing and directly use the fields returned by the query (neighbour, relation, node) or (node, relation, neighbour) as appropriate. Adjust the tuple construction to use these fields correctly without re-indexing into the relationship object. Apply the same fix to both the predecessors and successors loops.
149-161:
⚠️ Potential issueDynamic label injection is syntactically invalid
node:$node_label(and the identical pattern in theON MATCHclause) is not valid Cypher – label names cannot be parameterised. The statement will raise aNeo4jError: Parameter maps cannot be used for labels.
Embed the label into the query text instead (safe because it comes fromtype(node).__name__, i.e. trusted code) and drop the unused$node_labelparameter.- MERGE (node {id: $node_id}) - ON CREATE SET node:$node_label, node += $properties, node.updated_at = timestamp() - ON MATCH SET node:$node_label, node += $properties, node.updated_at = timestamp() + MERGE (node:{node_label} {{id: $node_id}}) + ON CREATE SET node += $properties, node.updated_at = timestamp() + ON MATCH SET node += $properties, node.updated_at = timestamp()and
- params = { - "node_id": str(node.id), - "node_label": type(node).__name__, - "properties": serialized_properties, - } + params = { + "node_id": str(node.id), + "properties": serialized_properties, + } + query = query.format(node_label=type(node).__name__)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.query = """ MERGE (node:{node_label} {{id: $node_id}}) ON CREATE SET node += $properties, node.updated_at = timestamp() ON MATCH SET node += $properties, node.updated_at = timestamp() RETURN ID(node) AS internal_id,node.id AS nodeId """ params = { "node_id": str(node.id), "properties": serialized_properties, } query = query.format(node_label=type(node).__name__) return await self.query(query, params)🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 149 to 161, the Cypher query incorrectly uses a parameter for the node label, which is not allowed. To fix this, remove the parameterized label `$node_label` from the query and instead directly embed the label string from `type(node).__name__` into the query text. Also, remove the `node_label` entry from the `params` dictionary since it will no longer be used.
468-478:
⚠️ Potential issueRelationship extraction returns garbage
result["r"][1]indexes into the relationship object (which is not subscriptable). Return the relationship type captured in Cypher instead.-MATCH (n {id: $node_id})-[r]-(m) -RETURN n, r, m +MATCH (n {id: $node_id})-[r]-(m) +RETURN n, TYPE(r) AS rel_type, m ... return [ - (result["n"]["id"], result["m"]["id"], {"relationship_name": result["r"][1]}) + (result["n"]["id"], result["m"]["id"], {"relationship_name": result["rel_type"]}) for result in results ]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.query = """ MATCH (n {id: $node_id})-[r]-(m) RETURN n, TYPE(r) AS rel_type, m """ results = await self.query(query, dict(node_id=node_id)) return [ (result["n"]["id"], result["m"]["id"], {"relationship_name": result["rel_type"]}) for result in results ]🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 473-473: Consider using '{"node_id": node_id}' instead of a call to 'dict'.
(R1735)
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 468 to 478, the code incorrectly tries to subscript the relationship object with result["r"][1], which is not valid. Modify the Cypher query to explicitly return the relationship type using the type() function, e.g., RETURN n, r, m, type(r) AS rel_type, and then update the code to use result["rel_type"] instead of result["r"][1] to correctly extract the relationship type.
726-733:
⚠️ Potential issue
remove_connection_to_successors_ofuses dynamic labels incorrectly
MATCH (node:{id})<-[r:{edge_label}]-(successor)is malformed:
- Back-ticks are placed around the text
{id}instead of the variable.edge_labelis interpolated directly, risking Cypher injection.Prefer property based matching (consistent with the rest of the adapter):
UNWIND $node_ids AS nid -MATCH (node:`{id}`)<-[r:{edge_label}]-(successor) +MATCH (successor)-[r]->(node {id: nid}) +WHERE type(r) = $edge_label DELETE rCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py around lines 726 to 733, the Cypher query uses dynamic labels incorrectly by placing back-ticks around the literal text `{id}` instead of the variable and directly interpolating `edge_label`, which risks injection. To fix this, replace dynamic label usage with property-based matching by removing back-ticks and parameterizing both node labels and edge labels as properties in the query, ensuring all variables are passed safely via parameters to prevent injection.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin. --------- Co-authored-by: vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Vasilije <[email protected]> Co-authored-by: Igor Ilic <[email protected]> Co-authored-by: Hande <[email protected]> Co-authored-by: Matea Pesic <[email protected]> Co-authored-by: hajdul88 <[email protected]> Co-authored-by: Daniel Molnar <[email protected]> Co-authored-by: Diego Baptista Theuerkauf <[email protected]> Co-authored-by: github-actions[bot] <[email protected]>
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.