-
Notifications
You must be signed in to change notification settings - Fork 1k
release: 0.2.4 #1300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release: 0.2.4 #1300
Conversation
…el and logging from child error
Co-authored-by: Boris <[email protected]>
<!-- .github/pull_request_template.md --> ## Description Resolve issue with Windows path ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
<!-- .github/pull_request_template.md --> ## Description Update dev branch with changes from main ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
…tputs, and updated Python version to 3.12.7. Removed unnecessary code cells and ensured consistent environment variable handling across demos.
Update Jupyter notebooks: Refactored code cells and ensured consistent environment variable handling across demos. <!-- .github/pull_request_template.md --> ## Description <!-- Provide a clear description of the changes in this PR --> ## DCO Affirmation I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.
| name: CLI Unit Tests | ||
| runs-on: ubuntu-22.04 | ||
| env: | ||
| LLM_PROVIDER: openai | ||
| LLM_MODEL: ${{ secrets.LLM_MODEL }} | ||
| LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }} | ||
| LLM_API_KEY: ${{ secrets.LLM_API_KEY }} | ||
| LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }} | ||
| EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }} | ||
| EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }} | ||
| EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }} | ||
| EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }} | ||
| steps: | ||
| - name: Check out repository | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Cognee Setup | ||
| uses: ./.github/actions/cognee_setup | ||
| with: | ||
| python-version: ${{ inputs.python-version }} | ||
|
|
||
| - name: Run CLI Unit Tests | ||
| run: uv run pytest cognee/tests/unit/cli/ -v | ||
|
|
||
| cli-integration-tests: |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
To resolve the flagged issue, the workflow should explicitly specify permissions: to restrict the default GitHub token to the minimum necessary. Since the test jobs (cli-unit-tests, cli-integration-tests, etc.) only check out repository code and run tests, the minimal required permission is contents: read. This can be set at the workflow (top) level so all jobs inherit the restriction, unless a job requires broader/scoped permissions.
Add a permissions: block after the workflow's name: declaration (line 1), before or after the on: block (the order does not matter). This can be achieved by inserting:
permissions:
contents: readThis change makes it clear the workflow only needs to read source code, with no write or admin permissions provided.
-
Copy modified lines R1-R2
| @@ -1,3 +1,5 @@ | ||
| permissions: | ||
| contents: read | ||
| name: CLI Tests | ||
|
|
||
| on: |
| name: CLI Integration Tests | ||
| runs-on: ubuntu-22.04 | ||
| env: | ||
| LLM_PROVIDER: openai | ||
| LLM_MODEL: ${{ secrets.LLM_MODEL }} | ||
| LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }} | ||
| LLM_API_KEY: ${{ secrets.LLM_API_KEY }} | ||
| LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }} | ||
| EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }} | ||
| EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }} | ||
| EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }} | ||
| EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }} | ||
| steps: | ||
| - name: Check out repository | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Cognee Setup | ||
| uses: ./.github/actions/cognee_setup | ||
| with: | ||
| python-version: ${{ inputs.python-version }} | ||
|
|
||
| - name: Run CLI Integration Tests | ||
| run: uv run pytest cognee/tests/integration/cli/ -v | ||
|
|
||
| cli-functionality-tests: |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
To fix this problem, we should explicitly set the minimal necessary permissions for the workflow. Since all jobs in this workflow only check out the code and run Python tests, the only required permission for the GITHUB_TOKEN is read access to the repository contents. This can be accomplished by adding a permissions block at the top level of the workflow file (after the name field, before on:), ensuring it applies to every job unless overridden.
Change:
- Add the following block near the top of
.github/workflows/cli_tests.yml:permissions: contents: read
No further code or dependency changes are required.
-
Copy modified lines R1-R2
| @@ -1,3 +1,5 @@ | ||
| permissions: | ||
| contents: read | ||
| name: CLI Tests | ||
|
|
||
| on: |
| name: CLI Functionality Tests | ||
| runs-on: ubuntu-22.04 | ||
| env: | ||
| LLM_PROVIDER: openai | ||
| LLM_MODEL: ${{ secrets.LLM_MODEL }} | ||
| LLM_ENDPOINT: ${{ secrets.LLM_ENDPOINT }} | ||
| LLM_API_KEY: ${{ secrets.LLM_API_KEY }} | ||
| LLM_API_VERSION: ${{ secrets.LLM_API_VERSION }} | ||
| EMBEDDING_MODEL: ${{ secrets.EMBEDDING_MODEL }} | ||
| EMBEDDING_ENDPOINT: ${{ secrets.EMBEDDING_ENDPOINT }} | ||
| EMBEDDING_API_KEY: ${{ secrets.EMBEDDING_API_KEY }} | ||
| EMBEDDING_API_VERSION: ${{ secrets.EMBEDDING_API_VERSION }} | ||
| steps: | ||
| - name: Check out repository | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 0 | ||
|
|
||
| - name: Cognee Setup | ||
| uses: ./.github/actions/cognee_setup | ||
| with: | ||
| python-version: ${{ inputs.python-version }} | ||
|
|
||
| - name: Test CLI Help Commands | ||
| run: | | ||
| uv run python -m cognee.cli._cognee --help | ||
| uv run python -m cognee.cli._cognee --version | ||
| uv run python -m cognee.cli._cognee add --help | ||
| uv run python -m cognee.cli._cognee search --help | ||
| uv run python -m cognee.cli._cognee cognify --help | ||
| uv run python -m cognee.cli._cognee delete --help | ||
| uv run python -m cognee.cli._cognee config --help | ||
| - name: Test CLI Config Subcommands | ||
| run: | | ||
| uv run python -m cognee.cli._cognee config get --help | ||
| uv run python -m cognee.cli._cognee config set --help | ||
| uv run python -m cognee.cli._cognee config list --help | ||
| uv run python -m cognee.cli._cognee config unset --help | ||
| uv run python -m cognee.cli._cognee config reset --help | ||
| - name: Test CLI Error Handling | ||
| run: | | ||
| # Test invalid command (should fail gracefully) | ||
| ! uv run python -m cognee.cli._cognee invalid_command | ||
| # Test missing required arguments (should fail gracefully) | ||
| ! uv run python -m cognee.cli._cognee search | ||
| # Test invalid search type (should fail gracefully) | ||
| ! uv run python -m cognee.cli._cognee search "test" --query-type INVALID_TYPE | ||
| # Test invalid chunker (should fail gracefully) | ||
| ! uv run python -m cognee.cli._cognee cognify --chunker InvalidChunker |
Check warning
Code scanning / CodeQL
Workflow does not contain permissions Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 months ago
The best way to address this issue is to add an explicit permissions block at the root of the workflow YAML. This block should restrict permissions of the GITHUB_TOKEN to only what is needed—typically contents: read for workflows that just check out source code and run tests. Since none of the jobs here push to the repo, manage issues, pull requests, packages, or environments, and only access code for testing, contents: read is sufficient. The block should be added immediately after the name: CLI Tests (or after the on: block) before env: or jobs:.
No other imports, methods, or code definitions are needed; only a single YAML config line is required.
-
Copy modified lines R33-R34
| @@ -30,6 +30,8 @@ | ||
| required: true | ||
|
|
||
|
|
||
| permissions: | ||
| contents: read | ||
| env: | ||
| RUNTIME__LOG_LEVEL: ERROR | ||
| ENV: 'dev' |
Please make sure all the checkboxes are checked:
|
WalkthroughRefactors pipelines to a new run_pipeline flow with authorization/setup layers; extends search to support interaction logging and feedback; adds custom_prompt across cognify and LLM extraction; renames token limits to max_completion_tokens across LLM/embedding/tokenizers; removes NetworkX graph adapter; introduces a full CLI; updates defaults to gpt-5-mini; broad exception hierarchy rework; adjusts file utils and tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant API as /v1/search
participant Search as specific_search_by_context
participant Retriever as Graph*Retriever
participant LLM as LLMGateway
participant Graph as GraphEngine
participant Store as add_data_points
User->>API: query_text, query_type, top_k, save_interaction, last_k
API->>Search: authorize + dispatch
Search->>Retriever: get_context(query)
Retriever->>Graph: cypher/vector queries (top_k)
Graph-->>Retriever: context, triplets
Retriever->>LLM: generate completion
LLM-->>Retriever: completion
alt save_interaction == true
Retriever->>Store: add CogneeUserInteraction
Store->>Graph: add nodes/edges (used_graph_element_to_answer)
end
API-->>User: results
sequenceDiagram
autonumber
participant Client as Client/Router
participant Cog as cognify()
participant Exec as get_pipeline_executor()
participant Pipe as run_pipeline(...)
participant Task as Task(extract_graph_from_data)
participant LLM as LLMGateway.extract_content_graph
Client->>Cog: data, datasets, custom_prompt?
Cog->>Exec: choose executor (bg/blocking)
Exec-->>Cog: executor func
Cog->>Pipe: run_pipeline(tasks, data, ...)
Pipe->>Task: extract_graph_from_data(..., custom_prompt)
Task->>LLM: extract_content_graph(..., custom_prompt)
LLM-->>Task: content graph
Task-->>Pipe: nodes/edges
Pipe-->>Cog: run info
Cog-->>Client: result
sequenceDiagram
autonumber
participant User
participant Mini as python -m cognee
participant Main as cognee.cli._cognee.main
participant Cmd as Command.execute
participant Pkg as cognee (lazy import)
User->>Mini: args
Mini->>Main: delegate
Main->>Main: discover commands + parse
Main->>Cmd: execute(parsed args)
Cmd->>Pkg: call async API via asyncio.run(...)
Pkg-->>Cmd: result / raises
alt success
Cmd-->>Main: exit code 0
else error
Cmd-->>Main: CliCommandException(error_code)
end
Main-->>User: exit
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related issues
Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 49
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (38)
cognee/infrastructure/files/utils/get_file_metadata.py (1)
49-56: content_hash may be undefined; except block can also raise AttributeError.If get_file_content_hash/seek fails, content_hash is never set, causing UnboundLocalError at return. Also logger.error references file.name, which may not exist.
- try: + content_hash = "" + try: file.seek(0) content_hash = await get_file_content_hash(file) file.seek(0) except io.UnsupportedOperation as error: - logger.error(f"Error retrieving content hash for file: {file.name} \n{str(error)}\n\n") + logger.exception( + f"Error retrieving content hash for file: {getattr(file, 'name', '<unknown>')}" + )cognee/infrastructure/loaders/LoaderInterface.py (1)
61-73: Add explicit return type and fix docstring (remove stale file_stream).Align the interface with the path-based design and actual implementations returning a stored file path.
- async def load(self, file_path: str, **kwargs): + async def load(self, file_path: str, **kwargs) -> str: """ - Load and process the file, returning standardized result. + Load and process the file. Args: - file_path: Path to the file to be processed - file_stream: If file stream is provided it will be used to process file instead - **kwargs: Additional loader-specific configuration + file_path: Path to the file to be processed. + **kwargs: Loader-specific configuration. + + Returns: + str: Full path to the stored processed content. Raises: - Exception: If file cannot be processed + Exception: If the file cannot be processed. """ passOptionally, consider accepting pathlib.Path / os.PathLike for file_path in the interface for flexibility.
cognee/infrastructure/files/utils/get_data_file_path.py (1)
31-39: S3 URL construction uses os.sep/os.path, yielding invalid backslashes on Windows.Using os.sep introduces backslashes in s3:// URLs on Windows (invalid). Normalize with POSIX semantics.
Apply this diff:
- elif file_path.startswith("s3://"): - # Handle S3 URLs without normalization (which corrupts them) - parsed_url = urlparse(file_path) - - normalized_url = ( - f"s3://{parsed_url.netloc}{os.sep}{os.path.normpath(parsed_url.path).lstrip(os.sep)}" - ) - - return normalized_url + elif file_path.startswith("s3://"): + # Preserve POSIX-style separators regardless of OS + parsed_url = urlparse(file_path) + import posixpath + key = posixpath.normpath(parsed_url.path or "") + if not key.startswith("/"): + key = "/" + key + return f"s3://{parsed_url.netloc}{key}"cognee/infrastructure/databases/graph/get_graph_engine.py (1)
67-75: Ensure adapter constructors align with passed kwargsThe current implementation unconditionally invokes
adapter( graph_database_url=..., graph_database_username=..., graph_database_password=..., )but most of our graph adapters neither accept all three parameters nor use those exact names:
• MemgraphAdapter (cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py)
• Signature:__init__(self, graph_database_url: str)
• Does not acceptgraph_database_usernameorgraph_database_password.• Neo4jAdapter (cognee/infrastructure/databases/graph/neo4j_driver/adapter.py)
• Signature:__init__(self, graph_database_url: str)
• Does not acceptgraph_database_usernameorgraph_database_password.• RemoteKuzuAdapter (cognee/infrastructure/databases/graph/kuzu/remote_kuzu_adapter.py)
• Signature:__init__(self, api_url: str, username: str, password: str)
• Does not acceptgraph_database_url; expectsapi_url.• KuzuAdapter (cognee/infrastructure/databases/graph/kuzu/adapter.py)
• Signature:__init__(self, db_path: str)
• No support for URL, username, or password.• NeptuneAnalyticsAdapter (cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py)
• Inherited fromNeptuneGraphDB; constructor expectsgraph_id: str, not URL or credentials.As a result, calls to adapters other than RemoteKuzuAdapter will raise
TypeError: unexpected keyword argument ….Recommendations:
• Filter and rename kwargs before invoking the adapter, for example:from inspect import signature sig_params = set(signature(adapter.__init__).parameters) - {"self"} init_kwargs = { "graph_database_url": graph_database_url, "graph_database_username": graph_database_username, "graph_database_password": graph_database_password, } filtered = { param: init_kwargs[k] for k, param in [ ("graph_database_url", "graph_database_url"), ("api_url", "api_url"), # for RemoteKuzuAdapter ("username", "username"), # for RemoteKuzuAdapter ("password", "password"), # for RemoteKuzuAdapter ("graph_id", "graph_id"), # for NeptuneAnalyticsAdapter ] if k in init_kwargs and param in sig_params } return adapter(**filtered)• Alternatively, wrap each adapter in a small factory that transforms the common kwargs into the parameters its constructor expects.
Please update
get_graph_engine.pyto perform these checks or refactor the adapter factories so that invalid kwargs are not passed through.cognee/infrastructure/llm/utils.py (1)
41-68: Function name/semantics mismatch: returning context tokens from LiteLLM under a 'completion' name.litellm.model_cost[model]["max_tokens"] typically denotes the context window, not the completion/output cap. Either:
- rename to get_model_max_context_tokens, or
- fetch the correct completion/output limit (if available) and fall back with a clear log.
-def get_model_max_completion_tokens(model_name: str): +def get_model_max_context_tokens(model_name: str): @@ - max_completion_tokens = None + max_context_tokens = None @@ - if model_name in litellm.model_cost: - max_completion_tokens = litellm.model_cost[model_name]["max_tokens"] - logger.debug(f"Max input tokens for {model_name}: {max_completion_tokens}") + if model_name in litellm.model_cost: + max_context_tokens = litellm.model_cost[model_name].get("max_tokens") + logger.debug(f"Max context tokens for {model_name}: {max_context_tokens}") else: logger.info("Model not found in LiteLLM's model_cost.") - return max_completion_tokens + return max_context_tokensAdditionally, to preserve the current API until call sites are updated, add this shim outside the changed region:
# Backward-compat shim; consider deprecating. def get_model_max_completion_tokens(model_name: str): logger.debug("get_model_max_completion_tokens() currently returns context window size; consider using get_model_max_context_tokens().") return get_model_max_context_tokens(model_name)cognee/infrastructure/llm/config.py (1)
189-194: Bug: @lru_cache without parentheses decorates incorrectly.This turns the function into a decorator factory at import time.
-@lru_cache +@lru_cache() def get_llm_config():cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (3)
118-141: Missed await leads to coroutine misuse in has_node.
self.query(...)isn’t awaited; always returns a coroutine and then subscripts it.- results = self.query( + results = await self.query( f""" MATCH (n:`{BASE_LABEL}`) WHERE n.id = $node_id RETURN COUNT(n) > 0 AS node_exists """, {"node_id": node_id}, ) return results[0]["node_exists"] if len(results) > 0 else False
315-327: has_edge returns wrong count variable.
relationshipisn’t bound; countr(or*) instead.- MATCH (from_node: `{BASE_LABEL}`)-[:`{edge_label}`]->(to_node: `{BASE_LABEL}`) + MATCH (from_node: `{BASE_LABEL}`)-[r:`{edge_label}`]->(to_node: `{BASE_LABEL}`) WHERE from_node.id = $from_node_id AND to_node.id = $to_node_id - RETURN COUNT(relationship) > 0 AS edge_exists + RETURN COUNT(r) > 0 AS edge_exists
344-361: has_edges compares internal IDs; should use node id property.
id(a)/id(b)are internal ints; you pass string IDs.- WHERE id(a) = edge.from_node AND id(b) = edge.to_node AND type(r) = edge.relationship_name + WHERE a.id = edge.from_node AND b.id = edge.to_node AND type(r) = edge.relationship_namecognee/infrastructure/llm/tokenizer/TikToken/adapter.py (1)
13-26: Backcompat for max_tokens and safer tokenizer init.The param/method rename is a breaking change. Provide a deprecation path and guard unknown models.
- def __init__( - self, - model: Optional[str] = None, - max_completion_tokens: int = 8191, - ): + def __init__( + self, + model: Optional[str] = None, + max_completion_tokens: int = 8191, + max_tokens: Optional[int] = None, # deprecated + ): self.model = model - self.max_completion_tokens = max_completion_tokens + # Backward compatibility: prefer explicit max_completion_tokens; otherwise honor deprecated max_tokens + if max_tokens is not None and max_completion_tokens == 8191: + warnings.warn( + "max_tokens is deprecated; use max_completion_tokens", + DeprecationWarning, + stacklevel=2, + ) + max_completion_tokens = max_tokens + self.max_completion_tokens = max_completion_tokens # Initialize TikToken for GPT based on model - if model: - self.tokenizer = tiktoken.encoding_for_model(self.model) - else: - # Use default if model not provided - self.tokenizer = tiktoken.get_encoding("cl100k_base") + if model: + try: + self.tokenizer = tiktoken.encoding_for_model(self.model) + except Exception: + # Unknown/unsupported model → fallback to default encoding + self.tokenizer = tiktoken.get_encoding("cl100k_base") + else: + self.tokenizer = tiktoken.get_encoding("cl100k_base")Add import (outside this hunk):
import warningscognee/shared/logging_utils.py (1)
345-360: Bug: hasattr check uses module name instead of string literal.This condition never matches; use "name".
- if hasattr(exc_type, __name__): + if hasattr(exc_type, "__name__"): event_dict["exception_type"] = exc_type.__name__cognee/shared/exceptions/exceptions.py (1)
5-12: Unify IngestionError definitions into a single canonical classWe currently have two separate IngestionError classes with divergent defaults and locations:
cognee/shared/exceptions/exceptions.py
• Default message:"Failed to load data."
• Default status:HTTP_422_UNPROCESSABLE_ENTITY
cognee/modules/ingestion/exceptions/exceptions.py
• Default message:"Type of data sent to classify not supported."
• Default status:HTTP_415_UNSUPPORTED_MEDIA_TYPEAll ingestion-related code (in
cognee/tasks/...andcognee/modules/ingestion/...) imports the modules variant today, but having two definitions risks drifting defaults and inconsistent error handling. Let’s consolidate to a single class in shared and have modules re-export it.Files requiring updates:
cognee/shared/exceptions/exceptions.pycognee/modules/ingestion/exceptions/__init__.py- remove or archive
cognee/modules/ingestion/exceptions/exceptions.pyProposed changes:
--- a/cognee/shared/exceptions/exceptions.py +++ b/cognee/shared/exceptions/exceptions.py class IngestionError(CogneeValidationError): def __init__( self, - message: str = "Failed to load data.", - name: str = "IngestionError", - status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, + message: str = "Type of data sent to classify not supported.", + name: str = "IngestionError", + status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE, ): super().__init__(message, name, status_code)--- a/cognee/modules/ingestion/exceptions/__init__.py +++ b/cognee/modules/ingestion/exceptions/__init__.py -from .exceptions import IngestionError +from cognee.shared.exceptions import IngestionError• Delete or archive
cognee/modules/ingestion/exceptions/exceptions.py• Review all call sites that depended on the old 422 default (e.g., in
infer_data_ontology.py) and add an explicitstatus_code=status.HTTP_422_UNPROCESSABLE_ENTITYwhere still desired.This ensures a single source of truth for IngestionError and prevents future semantic drift.
cognee/modules/graph/cognee_graph/CogneeGraph.py (1)
124-126: Duplicate skeleton-edge registration (adds edges twice to nodes).add_edge() already calls node1/node2.add_skeleton_edge; doing it again here double-counts degrees and may skew algorithms.
- self.add_edge(edge) - - source_node.add_skeleton_edge(edge) - target_node.add_skeleton_edge(edge) + self.add_edge(edge)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
380-385: Empty string or empty vector slips past guard and crashes later.query_text="" with query_vector=None bypasses the None-only check; then query_embeddings=[None] causes a runtime error. Treat blank strings and empty vectors as missing.
- if query_text is None and query_vector is None: - raise MissingQueryParameterError() - - if query_text and not query_vector: - query_vector = (await self.embedding_engine.embed_text([query_text]))[0] + # Guard: require either non-blank text or a non-empty vector + text_missing = (query_text is None) or ( + isinstance(query_text, str) and query_text.strip() == "" + ) + vector_missing = (query_vector is None) or (isinstance(query_vector, list) and len(query_vector) == 0) + if text_missing and vector_missing: + raise MissingQueryParameterError() + + # If valid text is provided and vector missing, embed it + if (query_text is not None and isinstance(query_text, str) and query_text.strip() != "") and vector_missing: + query_vector = (await self.embedding_engine.embed_text([query_text]))[0]cognee/modules/pipelines/exceptions/exceptions.py (1)
5-12: Update PipelineRunFailedError to default HTTP 500The
PipelineRunFailedErrorclass inherits fromCogneeSystemError, which by design represents server‐side failures and defaults to HTTP 500. However, its defaultstatus_codeis currently set to 422 (Unprocessable Entity), which conflicts with the semantics of a system error. No callers or tests explicitly depend on a 422 for this exception, so it should be updated to 500.• File:
cognee/modules/pipelines/exceptions/exceptions.py
• Lines: 8–12Apply the following change:
class PipelineRunFailedError(CogneeSystemError): def __init__( self, message: str = "Pipeline run failed.", name: str = "PipelineRunFailedError", - status_code: int = status.HTTP_422_UNPROCESSABLE_ENTITY, + status_code: int = status.HTTP_500_INTERNAL_SERVER_ERROR, ): super().__init__(message, name, status_code)cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/ollama/adapter.py (1)
46-49: Blocking sync client used inside async methods; also missing max_completion_tokens in structured output call.
- Async methods call the sync OpenAI client, likely blocking the event loop.
- acreate_structured_output doesn’t pass max_completion_tokens.
Proposed fix (use AsyncOpenAI and await API calls; propagate token limit):
- from openai import OpenAI + from openai import AsyncOpenAI @@ - self.aclient = instructor.from_openai( - OpenAI(base_url=self.endpoint, api_key=self.api_key), mode=instructor.Mode.JSON - ) + self.aclient = instructor.from_openai( + AsyncOpenAI(base_url=self.endpoint, api_key=self.api_key), mode=instructor.Mode.JSON + ) @@ - response = self.aclient.chat.completions.create( + response = await self.aclient.chat.completions.create( model=self.model, messages=[ { "role": "user", "content": f"{text_input}", }, { "role": "system", "content": system_prompt, }, ], max_retries=5, + max_completion_tokens=self.max_completion_tokens, response_model=response_model, ) @@ - transcription = self.aclient.audio.transcriptions.create( + transcription = await self.aclient.audio.transcriptions.create( model="whisper-1", # Ensure the correct model for transcription file=audio_file, language="en", ) @@ - response = self.aclient.chat.completions.create( + response = await self.aclient.chat.completions.create( model=self.model, messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"}, }, ], } ], - max_completion_tokens=300, + max_completion_tokens=self.max_completion_tokens, )Also applies to: 75-92, 113-119, 150-165
cognee/infrastructure/llm/tokenizer/Gemini/adapter.py (1)
80-83: Token counting is incorrect (using embedding length); use GenerativeModel.count_tokenslen(embed_content(...)) returns embedding vector length, not tokens. This will miscount and break limits.
Apply:
- return len(genai.embed_content(model=f"models/{self.model}", content=text)) + # Use GenerativeModel.count_tokens for accurate token accounting + model_name = self.model if self.model.startswith("models/") else f"models/{self.model}" + model = genai.GenerativeModel(model_name=model_name) + return int(model.count_tokens(text).total_tokens)cognee/tasks/ingestion/resolve_data_directories.py (1)
41-61: Handle S3FileSystemNotFoundError explicitly to prevent unhandled pipeline crashesRaising S3FileSystemNotFoundError in resolve_data_directories.py currently bubbles up through the generic
except Exceptionin run_tasks.py and is re-thrown (since it’s not a PipelineRunFailedError). To ensure the pipeline surface a clean failure event rather than crash:• In
cognee/modules/pipelines/operations/run_tasks.py, before the broadexcept Exception, add a specific handler for S3FileSystemNotFoundError that converts it into a PipelineRunFailedError (so it’s caught and reported as an ingestion failure).
• Alternatively, catch S3FileSystemNotFoundError there and log or wrap it with a user-friendly message.Example diff in run_tasks.py:
try: … - except Exception as error: + except S3FileSystemNotFoundError as error: + # Surface S3 errors as ingestion failures without crashing + raise PipelineRunFailedError( + message=f"S3 filesystem not found for data path: {error}" + ) + except Exception as error: await log_pipeline_run_error( pipeline_run_id, pipeline_id, pipeline_name, dataset_id, data, error ) …This ensures S3 filesystem issues are reported as pipeline failures (yielding a PipelineRunErrored event) rather than causing an unhandled exception.
cognee/infrastructure/databases/exceptions/__init__.py (1)
7-14: Make re-exports explicit to satisfy ruff F401 and clarify API surfaceAdd all to declare the public API for this package.
Additional change (outside the selected lines):
__all__ = [ "EntityNotFoundError", "EntityAlreadyExistsError", "DatabaseNotCreatedError", "EmbeddingException", "MissingQueryParameterError", "MutuallyExclusiveQueryParametersError", ]cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (3)
288-298: Params defined but not used in query; inline interpolation risks injection and breaks binding.Limit, embedding, and collection_name are f-string inlined while a params map is passed. Use parameter placeholders.
Apply:
- # Compose the parameters map - params = dict(embedding=embedding, param_topk=limit) + # Compose the parameters map + params = dict(embedding=embedding, param_topk=limit, collection_name=collection_name) # Compose the query - query_string = f""" + query_string = f""" CALL neptune.algo.vectors.topKByEmbeddingWithFiltering({{ - topK: {limit}, - embedding: {embedding}, - nodeFilter: {{ equals: {{property: '{self._COLLECTION_PREFIX}', value: '{collection_name}'}} }} + topK: $param_topk, + embedding: $embedding, + nodeFilter: {{ equals: {{property: '{self._COLLECTION_PREFIX}', value: $collection_name}} }} }} ) YIELD node, score """
220-224: Wrong identifier used in WHERE; retrieval will miss nodes created with~id.You MERGE by property
~idbut filter byid(n). Filter by the same property.- f"MATCH( n :{self._VECTOR_NODE_LABEL}) " - f"WHERE id(n) in $node_ids AND " + f"MATCH( n :{self._VECTOR_NODE_LABEL}) " + f"WHERE n.`~id` IN $node_ids AND " f"n.{self._COLLECTION_PREFIX} = $collection_name " f"RETURN n as payload "
363-366: Same identifier mismatch on delete; may fail to delete intended nodes.Align with
~id-based identity.- f"WHERE id(n) IN $node_ids " + f"WHERE n.`~id` IN $node_ids " f"AND n.{self._COLLECTION_PREFIX} = $collection_name " f"DETACH DELETE n"cognee/api/v1/datasets/routers/get_datasets_router.py (2)
333-341: Bug: treating list as None; empty list will cause IndexError at dataset[0].Check emptiness, not None.
- if dataset is None: + if not dataset: return JSONResponse( status_code=404, - content=ErrorResponseDTO(f"Dataset ({str(dataset_id)}) not found."), + content=ErrorResponseDTO(f"Dataset ({str(dataset_id)}) not found."), ) - dataset_data = await get_dataset_data(dataset_id=dataset[0].id) + dataset_data = await get_dataset_data(dataset_id=dataset[0].id)
431-438: Same bug in raw data route: list vs None.Fix emptiness check to avoid 500s.
- if dataset is None: + if not dataset: return JSONResponse( status_code=404, content={"detail": f"Dataset ({dataset_id}) not found."} )cognee/infrastructure/data/utils/extract_keywords.py (1)
27-35: Harden empty/whitespace input and nounless-text handling to avoid TF-IDF crashes.Current logic raises for empty string only; TF-IDF will also fail on inputs that yield no nouns.
- if len(text) == 0: - raise KeywordExtractionError() + if not text or not text.strip(): + raise KeywordExtractionError() @@ - nouns = [word for (word, tag) in tags if tag == "NN"] + nouns = [word for (word, tag) in tags if tag == "NN"] + if not nouns: + raise KeywordExtractionError("No nouns found in the input text.")cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/anthropic/adapter.py (1)
94-102: Ensure show_prompt always returns str (avoid None) and handle empty filesLLMGateway.read_query_prompt may return falsy/empty content; the function would return None while annotated as str. Either raise or return a valid string consistently.
Suggested change:
- system_prompt = LLMGateway.read_query_prompt(system_prompt) - - formatted_prompt = ( - f"""System Prompt:\n{system_prompt}\n\nUser Input:\n{text_input}\n""" - if system_prompt - else None - ) - - return formatted_prompt + prompt_text = LLMGateway.read_query_prompt(system_prompt) + if not prompt_text: + # Empty or unreadable prompt file = misconfiguration + raise MissingSystemPromptPathError() + + return f"System Prompt:\n{prompt_text}\n\nUser Input:\n{text_input}\n"cognee-frontend/src/ui/Partials/SearchView/SearchView.tsx (1)
17-19: Type-safety: add searchType to SearchFormPayloadformElements.searchType is used below but not declared on SearchFormPayload; this can fail type-checking.
Apply:
interface SearchFormPayload extends HTMLFormElement { chatInput: HTMLInputElement; + searchType: HTMLSelectElement; }cognee/api/v1/config/config.py (2)
151-157: Use InvalidConfigAttributeError instead of AttributeError for consistency.All other setters raise InvalidConfigAttributeError; this one still raises AttributeError. Align for uniform behavior and HTTP status mapping.
Apply this diff:
for key, value in config_dict.items(): if hasattr(graph_db_config, key): object.__setattr__(graph_db_config, key, value) else: - raise AttributeError(f"'{key}' is not a valid attribute of the config.") + raise InvalidConfigAttributeError(attribute=key)
163-169: Bug: exception is instantiated but never raised.Unknown vector DB config keys are silently ignored; should raise.
Apply this diff:
for key, value in config_dict.items(): if hasattr(vector_db_config, key): object.__setattr__(vector_db_config, key, value) else: - InvalidConfigAttributeError(attribute=key) + raise InvalidConfigAttributeError(attribute=key)cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
89-90: Harden HTTP call and reduce session overhead.Each prompt opens a new ClientSession and ignores non-2xx/JSON/key errors. Reuse one session and validate responses.
Apply:
- embeddings = await asyncio.gather(*[self._get_embedding(prompt) for prompt in text]) + async with aiohttp.ClientSession() as session: + tasks = [self._get_embedding(prompt, session) for prompt in text] + embeddings = await asyncio.gather(*tasks) return embeddings @@ - async def _get_embedding(self, prompt: str) -> List[float]: + async def _get_embedding(self, prompt: str, session: aiohttp.ClientSession) -> List[float]: @@ - async with aiohttp.ClientSession() as session: - async with session.post( - self.endpoint, json=payload, headers=headers, timeout=60.0 - ) as response: - data = await response.json() - return data["embedding"] + async with session.post(self.endpoint, json=payload, headers=headers, timeout=60.0) as response: + response.raise_for_status() + data = await response.json(content_type=None) + embedding = data.get("embedding") + if not isinstance(embedding, list): + raise ValueError("Ollama embeddings response missing 'embedding' list") + return embeddingAlso applies to: 106-112
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/gemini/adapter.py (1)
52-76: Return type can be str; align annotation.acreate_structured_output returns str when response_model is str. Update annotation.
-from typing import Type, Optional +from typing import Type, Optional, Union @@ - ) -> BaseModel: + ) -> Union[BaseModel, str]:cognee/api/v1/search/search.py (1)
174-176: Bug: stray list brackets make the type-check condition always truthy.This forces name→UUID resolution even when datasets aren’t strings.
- if datasets is not None and [all(isinstance(dataset, str) for dataset in datasets)]: + if datasets is not None and all(isinstance(dataset, str) for dataset in datasets):cognee/modules/data/exceptions/exceptions.py (1)
8-16: Name mismatch in UnstructuredLibraryImportError; adjust status code.
The default name is "UnstructuredModuleImportError", which doesn't match the class. Also, 422 is odd for a configuration/import error; 500 fits better.class UnstructuredLibraryImportError(CogneeConfigurationError): def __init__( self, message: str = "Import error. Unstructured library is not installed.", - name: str = "UnstructuredModuleImportError", - status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, + name: str = "UnstructuredLibraryImportError", + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, ): super().__init__(message, name, status_code)cognee/modules/retrieval/exceptions/exceptions.py (1)
15-22: Reconsider HTTP status code for system errors
CypherSearchErrorinherits fromCogneeSystemErrorwhich defaults to HTTP 500 (internal server error), but you're overriding it with HTTP 400 (bad request). This seems contradictory - if it's truly a system error during query execution, HTTP 500 is more appropriate. If it's actually a client error (e.g., malformed query), consider usingCogneeValidationErrorinstead.class CypherSearchError(CogneeSystemError): def __init__( self, message: str = "An error occurred during the execution of the Cypher query.", name: str = "CypherSearchError", - status_code: int = status.HTTP_400_BAD_REQUEST, + status_code: int = status.HTTP_500_INTERNAL_SERVER_ERROR, ): super().__init__(message, name, status_code)Or if this is actually a validation error:
-class CypherSearchError(CogneeSystemError): +class CypherSearchError(CogneeValidationError): def __init__( self, message: str = "An error occurred during the execution of the Cypher query.", name: str = "CypherSearchError", status_code: int = status.HTTP_400_BAD_REQUEST, ): super().__init__(message, name, status_code)cognee/tests/test_search_db.py (1)
1-3: Convert test_search_db.py into a pytest-discoverable async testCurrently,
cognee/tests/test_search_db.pyonly defines anasync main()behind a__main__guard, so pytest will skip it (no top-leveltest_functions). To have this exercise run in CI, refactor it into a proper pytest test:• Extract the core logic from
async def main()into a function named with thetest_prefix at module scope
• Decorate it with@pytest.mark.asyncio(or usepytest_asyncio’s fixture)
• Remove or isolate the__main__invocation (you can keep it in a separate CLI script)Example diff:
import os import pathlib -async def main(): +import pytest + +@pytest.mark.asyncio +async def test_search_db(tmp_path): # …existing search-DB logic (use tmp_path or fixtures instead of hard-coded paths) results = await run_search(os.getcwd(), pathlib.Path(“…”)) assert results == expected -if __name__ == '__main__': - import asyncio - asyncio.run(main()) +## Optional: keep this as a CLI entrypoint in a separate script: +# if __name__ == '__main__': +# import asyncio +# asyncio.run(main())This change will ensure pytest picks up and executes your search-DB logic in CI.
cognee/modules/search/methods/search.py (1)
88-94: Fix IndexError risk when logging empty results.When no results are returned,
search_results[0]will raise. Also, log shape differs from authorized path. Always serialize the list.Apply this diff:
- await log_result( - query.id, - json.dumps( - search_results if len(search_results) > 1 else search_results[0], cls=JSONEncoder - ), - user.id, - ) + await log_result( + query.id, + json.dumps(search_results, cls=JSONEncoder), + user.id, + )cognee/infrastructure/databases/exceptions/exceptions.py (2)
40-50: Call base initializer to preserve exception semantics.Directly setting attributes bypasses CogneeApiError init (args, logging controls). Use super(..., log=False) to keep no-log behavior.
Apply this diff:
def __init__( self, message: str = "The requested entity does not exist.", name: str = "EntityNotFoundError", status_code=status.HTTP_404_NOT_FOUND, ): - self.message = message - self.name = name - self.status_code = status_code - # super().__init__(message, name, status_code) :TODO: This is not an error anymore with the dynamic exception handling therefore we shouldn't log error + super().__init__(message, name, status_code, log=False)
78-87: Initialize via base class and align name casing.
- Same super(...) concern as above; also
nameis "NodeSet..." while class is "Nodeset...".Apply this diff:
def __init__( self, message: str = "The nodeset filter is not supported in the current graph database.", - name: str = "NodeSetFilterNotSupportedError", + name: str = "NodesetFilterNotSupportedError", status_code=status.HTTP_404_NOT_FOUND, ): - self.message = message - self.name = name - self.status_code = status_code + super().__init__(message, name, status_code, log=False)
|
|
||
| LLM_API_KEY="your_api_key" | ||
| LLM_MODEL="openai/gpt-4o-mini" | ||
| LLM_MODEL="openai/gpt-5-mini" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: env var names still use ..._MAX_TOKENS; update to ..._MAX_COMPLETION_TOKENS to match code refactor.
Otherwise these settings will be ignored by the new config.
-LLM_MAX_TOKENS="16384"
+LLM_MAX_COMPLETION_TOKENS="16384"
@@
-EMBEDDING_MAX_TOKENS=8191
+EMBEDDING_MAX_COMPLETION_TOKENS=8191Also applies to: 23-23, 31-31
🧰 Tools
🪛 dotenv-linter (3.3.0)
[warning] 19-19: [QuoteCharacter] The value has quote characters (', ")
(QuoteCharacter)
🤖 Prompt for AI Agents
In .env.template around lines 19, 23 and 31, the environment variable names
still use the old suffix _MAX_TOKENS and will be ignored by the refactored
config; rename each occurrence to use the new suffix _MAX_COMPLETION_TOKENS
(preserve the assigned values and any surrounding quotes), and update any
adjacent comments or examples to reflect the new variable names so they match
the code references.
| if __name__ == "__main__": | ||
| main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propagate CLI exit codes.
main() returns an int; without sys.exit the process always exits 0 when run via python -m cognee.
if __name__ == "__main__":
- main()
+ raise SystemExit(main())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if __name__ == "__main__": | |
| main() | |
| if __name__ == "__main__": | |
| raise SystemExit(main()) |
🤖 Prompt for AI Agents
In cognee/__main__.py around lines 3 to 4, the script calls main() directly so
its integer return value is ignored and the process always exits with code 0;
replace the direct call with sys.exit(main()) and ensure sys is imported (e.g.,
add "import sys" at the top) so the program propagates main()'s returned exit
code to the OS.
| save_interaction: bool = False, | ||
| last_k: Optional[int] = None, | ||
| ) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document last_k in the public API.
last_k is in the signature but not documented.
Args:
@@
- save_interaction: Save interaction (query, context, answer connected to triplet endpoints) results into the graph or not
+ save_interaction: Save interaction (query, context, answer connected to triplet endpoints) results into the graph or not
+ last_k: Limit context to the last K interactions when saving/logging feedback (None means provider default)Also applies to: 112-113
🤖 Prompt for AI Agents
In cognee/api/v1/search/search.py around lines 22-24 (and similarly at 112-113),
the public API docstrings are missing documentation for the last_k parameter;
update the function/class docstring to include a clear :param last_k:
description (type Optional[int], default None) explaining that when set it
limits the number of recent interactions/records considered (or whatever the
function semantics are) and that None means no limit, and include any edge-case
behavior (e.g., must be >0). Ensure the docs follow the existing docstring style
and appear alongside the other parameter docs so both occurrences are documented
consistently.
| await cognee.delete(dataset_name=None, user_id=args.user_id) | ||
| else: | ||
| await cognee.delete(dataset_name=args.dataset_name, user_id=args.user_id) | ||
| except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Confirm cognee.delete signature (lint flags unexpected kwargs)
Static analysis hints that dataset_name/user_id may not be valid parameters. Please verify and adjust calls accordingly (e.g., dataset_id, data_id).
🏁 Script executed:
#!/bin/bash
# Find the delete API surface and its parameters
rg -nP '^(async\s+)?def\s+delete\s*\(' -g 'cognee/**' -C2
rg -n '\bcognee\.delete\(' -g 'cognee/**' -C2Length of output: 5288
🏁 Script executed:
#!/bin/bash
# Display the CLI delete command to see how `cognee` is imported and which delete signature is used
sed -n '1,200p' cognee/cli/commands/delete_command.pyLength of output: 3215
🏁 Script executed:
#!/bin/bash
# Inspect the API delete function signature and top-level cognee delete export
sed -n '1,50p' cognee/api/v1/delete/delete.py
sed -n '1,50p' cognee/__init__.pyLength of output: 2794
Fix CLI delete calls to match cognee.delete signature
The CLI currently invokes:
await cognee.delete(dataset_name=…, user_id=…)but the API function is defined as:
async def delete(
data_id: UUID,
dataset_id: UUID,
mode: str = "soft",
user: User = None,
):There are no dataset_name or user_id parameters. This will raise unexpected‐keyword‐argument errors at runtime.
Please update cognee/cli/commands/delete_command.py (around lines 68–71) to:
- Resolve the dataset name (if provided) to its UUID before calling the API.
- Pass the data identifier(s) and
dataset_idtocognee.delete. - Supply the CLI’s
--user-idvalue as theuserargument (after loading the correspondingUserobject).
Example diff:
- if args.all:
- await cognee.delete(dataset_name=None, user_id=args.user_id)
- else:
- await cognee.delete(dataset_name=args.dataset_name, user_id=args.user_id)
+ # Determine target dataset UUID
+ dataset_uuid = await resolve_dataset_uuid(name=args.dataset_name)
+ # Determine target data IDs (e.g., all data for user or dataset)
+ data_ids = await resolve_data_ids(dataset_uuid, user_id=args.user_id, all=args.all)
+ # Perform deletion for each data point
+ for data_id in data_ids:
+ await cognee.delete(
+ data_id=data_id,
+ dataset_id=dataset_uuid,
+ user=await load_user(args.user_id),
+ )– Ensure helper functions like resolve_dataset_uuid, resolve_data_ids, and load_user are implemented or imported.
– Adjust prompts, logging, and loop control as needed.
🧰 Tools
🪛 Pylint (3.3.7)
[error] 68-68: Unexpected keyword argument 'dataset_name' in function call
(E1123)
[error] 68-68: Unexpected keyword argument 'user_id' in function call
(E1123)
[error] 68-68: No value for argument 'data_id' in function call
(E1120)
[error] 68-68: No value for argument 'dataset_id' in function call
(E1120)
[error] 70-70: Unexpected keyword argument 'dataset_name' in function call
(E1123)
[error] 70-70: Unexpected keyword argument 'user_id' in function call
(E1123)
[error] 70-70: No value for argument 'data_id' in function call
(E1120)
[error] 70-70: No value for argument 'dataset_id' in function call
(E1120)
🤖 Prompt for AI Agents
In cognee/cli/commands/delete_command.py around lines 68 to 71, the CLI is
calling cognee.delete with non‑existent keyword args (dataset_name, user_id);
change it to resolve any provided dataset name into a dataset_id UUID, resolve
the data identifiers to one or more data_id UUIDs, load the User object from the
provided --user-id and pass it as the user argument, then call
cognee.delete(data_id=<data_id>, dataset_id=<dataset_id>, mode=<mode>,
user=<User>) for each target data_id; ensure helper functions like
resolve_dataset_uuid, resolve_data_ids, and load_user are imported/used and
update prompts/logging/looping to iterate over multiple data_ids if returned.
| SEARCH_TYPE_CHOICES = [ | ||
| "GRAPH_COMPLETION", | ||
| "RAG_COMPLETION", | ||
| "INSIGHTS", | ||
| "CHUNKS", | ||
| "SUMMARIES", | ||
| "CODE", | ||
| "CYPHER", | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Potentially missing FEEDBACK search type; verify and add if applicable.
If SearchType includes FEEDBACK (used by UserQAFeedback), CLI should expose it here to avoid invalid option errors.
SEARCH_TYPE_CHOICES = [
"GRAPH_COMPLETION",
"RAG_COMPLETION",
"INSIGHTS",
"CHUNKS",
"SUMMARIES",
"CODE",
"CYPHER",
+ "FEEDBACK",
]Run to confirm canonical search types:
🏁 Script executed:
#!/bin/bash
rg -nP -C2 '\bclass\s+SearchType\b|Enum' -g 'cognee/**' ; rg -nP '\bFEEDBACK\b' -g 'cognee/**'Length of output: 14256
Include FEEDBACK in SEARCH_TYPE_CHOICES
The SearchType enum in cognee/modules/search/types/SearchType.py defines a FEEDBACK member, and it’s referenced in both the search implementation and tests. To prevent CLI users from encountering invalid-option errors when selecting FEEDBACK, the SEARCH_TYPE_CHOICES list in cognee/cli/config.py must be updated accordingly.
• Location needing update:
- cognee/cli/config.py, lines 19–27
• Verified evidence:
FEEDBACKis defined in the enum:
cognee/modules/search/types/SearchType.py:17SearchType.FEEDBACKis used in search methods/tests:
cognee/modules/search/methods/search.py:148; cognee/tests/test_search_db.py
--- a/cognee/cli/config.py
+++ b/cognee/cli/config.py
@@ SEARCH_TYPE_CHOICES = [
"CYPHER",
+ "FEEDBACK",
]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SEARCH_TYPE_CHOICES = [ | |
| "GRAPH_COMPLETION", | |
| "RAG_COMPLETION", | |
| "INSIGHTS", | |
| "CHUNKS", | |
| "SUMMARIES", | |
| "CODE", | |
| "CYPHER", | |
| ] | |
| SEARCH_TYPE_CHOICES = [ | |
| "GRAPH_COMPLETION", | |
| "RAG_COMPLETION", | |
| "INSIGHTS", | |
| "CHUNKS", | |
| "SUMMARIES", | |
| "CODE", | |
| "CYPHER", | |
| "FEEDBACK", | |
| ] |
🤖 Prompt for AI Agents
In cognee/cli/config.py around lines 19 to 27, the SEARCH_TYPE_CHOICES list is
missing the "FEEDBACK" option; update the array to include "FEEDBACK" so CLI
choices match the SearchType enum and avoid invalid-option errors when users
select FEEDBACK.
| @patch("cognee.add") | ||
| def test_add_command_integration(self, mock_add): | ||
| """Test add command integration""" | ||
| mock_add.return_value = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
This patching doesn’t affect the subprocess
@patch("cognee.add") runs in the parent process; the CLI is executed in a separate Python process, so the mock never applies. Remove the patch or switch this test to call the command implementation directly.
- @patch("cognee.add")
- def test_add_command_integration(self, mock_add):
+ def test_add_command_integration(self):
"""Test add command integration"""
- mock_add.return_value = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @patch("cognee.add") | |
| def test_add_command_integration(self, mock_add): | |
| """Test add command integration""" | |
| mock_add.return_value = None | |
| def test_add_command_integration(self): | |
| """Test add command integration""" |
🤖 Prompt for AI Agents
In cognee/tests/integration/cli/test_cli_integration.py around lines 67 to 71,
the test decorates with @patch("cognee.add") which only applies in the parent
process and does not affect the CLI run in a subprocess; remove the @patch
decorator and its mock usage when invoking the CLI via subprocess, or
alternatively change the test to call the command implementation directly
(import cognee.add and call it with appropriate args and asserts) so the mock
can be applied; update assertions to match the chosen approach (subprocess:
assert output/exit code, direct call: assert mock called or returned value).
| from cognee.cli.commands.cognify_command import CognifyCommand | ||
| from cognee.cli.commands.delete_command import DeleteCommand | ||
| from cognee.cli.commands.config_command import ConfigCommand | ||
| from cognee.cli.exceptions import CliCommandException, CliCommandInnerException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import
The CliCommandInnerException import is not used anywhere in this file.
from cognee.cli.commands.delete_command import DeleteCommand
from cognee.cli.commands.config_command import ConfigCommand
-from cognee.cli.exceptions import CliCommandException, CliCommandInnerException
+from cognee.cli.exceptions import CliCommandException📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from cognee.cli.exceptions import CliCommandException, CliCommandInnerException | |
| from cognee.cli.commands.delete_command import DeleteCommand | |
| from cognee.cli.commands.config_command import ConfigCommand | |
| from cognee.cli.exceptions import CliCommandException |
🧰 Tools
🪛 Ruff (0.12.2)
15-15: cognee.cli.exceptions.CliCommandInnerException imported but unused
Remove unused import: cognee.cli.exceptions.CliCommandInnerException
(F401)
🤖 Prompt for AI Agents
In cognee/tests/unit/cli/test_cli_commands.py around line 15, the import
statement includes CliCommandInnerException which is unused; remove
CliCommandInnerException from the import list so only CliCommandException is
imported to eliminate the unused-import warning.
| from cognee.modules.chunking.TextChunker import TextChunker | ||
|
|
||
| mock_cognee.cognify.assert_awaited_once_with( | ||
| datasets=None, | ||
| chunk_size=None, | ||
| ontology_file_path=None, | ||
| chunker=TextChunker, | ||
| run_in_background=False, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Import statement should be at module level
The import of TextChunker on line 232 should be at the top of the file with other imports, not inside the test method.
Add at the top of the file:
from cognee.cli.commands.cognify_command import CognifyCommand
from cognee.cli.commands.delete_command import DeleteCommand
from cognee.cli.commands.config_command import ConfigCommand
from cognee.cli.exceptions import CliCommandException
+from cognee.modules.chunking.TextChunker import TextChunkerAnd remove the import from within the test:
mock_asyncio_run.assert_called_once()
assert asyncio.iscoroutine(mock_asyncio_run.call_args[0][0])
- from cognee.modules.chunking.TextChunker import TextChunker
mock_cognee.cognify.assert_awaited_once_with(📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from cognee.modules.chunking.TextChunker import TextChunker | |
| mock_cognee.cognify.assert_awaited_once_with( | |
| datasets=None, | |
| chunk_size=None, | |
| ontology_file_path=None, | |
| chunker=TextChunker, | |
| run_in_background=False, | |
| ) | |
| # === At the top of cognee/tests/unit/cli/test_cli_commands.py === | |
| from cognee.cli.commands.cognify_command import CognifyCommand | |
| from cognee.cli.commands.delete_command import DeleteCommand | |
| from cognee.cli.commands.config_command import ConfigCommand | |
| from cognee.cli.exceptions import CliCommandException | |
| from cognee.modules.chunking.TextChunker import TextChunker | |
| # === Inside the test method (remove the inline import) === | |
| mock_asyncio_run.assert_called_once() | |
| assert asyncio.iscoroutine(mock_asyncio_run.call_args[0][0]) | |
| # from cognee.modules.chunking.TextChunker import TextChunker ← removed | |
| mock_cognee.cognify.assert_awaited_once_with( | |
| datasets=None, | |
| chunk_size=None, | |
| ontology_file_path=None, | |
| chunker=TextChunker, | |
| run_in_background=False, | |
| ) |
🤖 Prompt for AI Agents
In cognee/tests/unit/cli/test_cli_commands.py around lines 232 to 240, the test
contains a local import "from cognee.modules.chunking.TextChunker import
TextChunker" which should be moved to the module-level imports; remove the
in-test import and add the same import statement to the top of the file with the
other imports so the test uses the module-level TextChunker reference and keep
the assertion unchanged.
| def test_multiple_debug_enable_calls(self): | ||
| """Test multiple calls to enable_debug""" | ||
| debug.enable_debug() | ||
| debug.enable_debug() # Should not cause issues | ||
| assert debug.is_debug_enabled() is True | ||
|
|
||
| # Reset for other tests | ||
| debug._debug_enabled = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: resetting the wrong debug flag
debug.enable_debug() sets _DEBUG_FLAG; the test resets _debug_enabled, which has no effect. Reset the correct flag (or expose a reset helper).
- # Reset for other tests
- debug._debug_enabled = False
+ # Reset for other tests
+ debug._DEBUG_FLAG = False # noqa: SLF001 (okay in tests)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_multiple_debug_enable_calls(self): | |
| """Test multiple calls to enable_debug""" | |
| debug.enable_debug() | |
| debug.enable_debug() # Should not cause issues | |
| assert debug.is_debug_enabled() is True | |
| # Reset for other tests | |
| debug._debug_enabled = False | |
| def test_multiple_debug_enable_calls(self): | |
| """Test multiple calls to enable_debug""" | |
| debug.enable_debug() | |
| debug.enable_debug() # Should not cause issues | |
| assert debug.is_debug_enabled() is True | |
| # Reset for other tests | |
| debug._DEBUG_FLAG = False # noqa: SLF001 (okay in tests) |
🤖 Prompt for AI Agents
In cognee/tests/unit/cli/test_cli_utils.py around lines 120 to 127, the test
resets the wrong internal flag: debug.enable_debug() sets _DEBUG_FLAG but the
teardown resets _debug_enabled (which does nothing). Update the test to reset
the correct flag (set debug._DEBUG_FLAG = False) or, preferably, call a public
reset helper if one exists or add one and use it; ensure the final line restores
the same internal state mutated by enable_debug().
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [ | ||
| { | ||
| "name": "stdout", | ||
| "output_type": "stream", | ||
| "text": [ | ||
| "{'id': '3b530220-7e7c-52a2-8b62-ce5adce1a46c', 'created_at': 1751283883122, 'updated_at': 1751283883122, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': \"The joke queries the number of programmers required to change a light bulb and answers, 'None. That’s a hardware issue.' This humor highlights the divide between software and hardware challenges in programming.\"}\n", | ||
| "{'id': '128eb96e-fd36-53ef-ab6d-d4884ecbfee9', 'created_at': 1751283883122, 'updated_at': 1751283883122, 'ontology_valid': False, 'version': 1, 'topological_rank': 0, 'type': 'IndexSchema', 'text': \"Changing a light bulb doesn't require programmers.\"}\n" | ||
| "ename": "", | ||
| "evalue": "", | ||
| "output_type": "error", | ||
| "traceback": [ | ||
| "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. \n", | ||
| "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. \n", | ||
| "\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. \n", | ||
| "\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details." | ||
| ] | ||
| } | ||
| ], | ||
| "execution_count": 5 | ||
| "source": [ | ||
| "import os\n", | ||
| "os._exit(0)" | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove kernel-crash cell (os._exit(0)).
For demo content, forcefully exiting the interpreter is disruptive and can cause data loss.
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [
- {
- "ename": "",
- "evalue": "",
- "output_type": "error",
- "traceback": [
- "\u001b[1;31mThe Kernel crashed while executing code in the current cell or a previous cell. ",
- "\u001b[1;31mPlease review the code in the cell(s) to identify a possible cause of the failure. ",
- "\u001b[1;31mClick <a href='https://aka.ms/vscodeJupyterKernelCrash'>here</a> for more info. ",
- "\u001b[1;31mView Jupyter <a href='command:jupyter.viewOutput'>log</a> for further details."
- ]
- }
- ],
- "source": [
- "import os\n",
- "os._exit(0)"
- ]
- }
+ // removed crash demonstration cellCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In notebooks/cognee_multimedia_demo.ipynb around lines 385 to 404, the cell
forcefully exits the Python interpreter using os._exit(0), which crashes the
kernel and disrupts the demo; remove that line (and the whole cell) and replace
it with a non-destructive termination or a comment describing how to safely stop
execution (e.g., instruct users to stop the kernel via the UI or finish the demo
gracefully) so the notebook no longer causes a kernel crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review continued from previous batch...
| class InvalidConfigAttributeError(CogneeConfigurationError): | ||
| def __init__( | ||
| self, | ||
| attribute: str, | ||
| name: str = "InvalidConfigAttributeError", | ||
| status_code: int = status.HTTP_400_BAD_REQUEST, | ||
| ): | ||
| message = f"'{attribute}' is not a valid attribute of the configuration." | ||
| super().__init__(message, name, status_code) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status code mismatch for configuration error
InvalidConfigAttributeError inherits from CogneeConfigurationError which defaults to HTTP 500 (server error), but overrides it with HTTP 400 (client error). Configuration errors are typically server-side issues. If this is meant to represent an invalid user-provided configuration value, consider using CogneeValidationError as the base class instead.
Either change the base class:
-class InvalidConfigAttributeError(CogneeConfigurationError):
+class InvalidConfigAttributeError(CogneeValidationError):
def __init__(
self,
attribute: str,
name: str = "InvalidConfigAttributeError",
status_code: int = status.HTTP_400_BAD_REQUEST,
):Or use the appropriate status code for configuration errors:
class InvalidConfigAttributeError(CogneeConfigurationError):
def __init__(
self,
attribute: str,
name: str = "InvalidConfigAttributeError",
- status_code: int = status.HTTP_400_BAD_REQUEST,
+ status_code: int = status.HTTP_500_INTERNAL_SERVER_ERROR,
):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class InvalidConfigAttributeError(CogneeConfigurationError): | |
| def __init__( | |
| self, | |
| attribute: str, | |
| name: str = "InvalidConfigAttributeError", | |
| status_code: int = status.HTTP_400_BAD_REQUEST, | |
| ): | |
| message = f"'{attribute}' is not a valid attribute of the configuration." | |
| super().__init__(message, name, status_code) | |
| class InvalidConfigAttributeError(CogneeValidationError): | |
| def __init__( | |
| self, | |
| attribute: str, | |
| name: str = "InvalidConfigAttributeError", | |
| status_code: int = status.HTTP_400_BAD_REQUEST, | |
| ): | |
| message = f"'{attribute}' is not a valid attribute of the configuration." | |
| super().__init__(message, name, status_code) |
| class InvalidConfigAttributeError(CogneeConfigurationError): | |
| def __init__( | |
| self, | |
| attribute: str, | |
| name: str = "InvalidConfigAttributeError", | |
| status_code: int = status.HTTP_400_BAD_REQUEST, | |
| ): | |
| message = f"'{attribute}' is not a valid attribute of the configuration." | |
| super().__init__(message, name, status_code) | |
| class InvalidConfigAttributeError(CogneeConfigurationError): | |
| def __init__( | |
| self, | |
| attribute: str, | |
| name: str = "InvalidConfigAttributeError", | |
| status_code: int = status.HTTP_500_INTERNAL_SERVER_ERROR, | |
| ): | |
| message = f"'{attribute}' is not a valid attribute of the configuration." | |
| super().__init__(message, name, status_code) |
🤖 Prompt for AI Agents
In cognee/api/v1/exceptions/exceptions.py around lines 5 to 14,
InvalidConfigAttributeError currently subclasses CogneeConfigurationError but
overrides the status_code to HTTP 400, causing a mismatch between the base class
intent (server-side config error) and the client error code; either change the
base class to CogneeValidationError (or another validation-related base) so a
400 is appropriate, or remove the status_code override and let it inherit the
server-side 500 from CogneeConfigurationError; update the class signature and
imports accordingly so the chosen base and status code are consistent.
| import os | ||
| import argparse | ||
| from typing import Any, Sequence, Dict, Type, cast, List | ||
| import click |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports
Multiple imports are unused in this file: os, cast, and click.
import sys
-import os
import argparse
-from typing import Any, Sequence, Dict, Type, cast, List
+from typing import Any, Sequence, Dict, Type, List
-import click📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import os | |
| import argparse | |
| from typing import Any, Sequence, Dict, Type, cast, List | |
| import click | |
| import sys | |
| import argparse | |
| from typing import Any, Sequence, Dict, Type, List |
🧰 Tools
🪛 Ruff (0.12.2)
2-2: os imported but unused
Remove unused import: os
(F401)
4-4: typing.cast imported but unused
Remove unused import: typing.cast
(F401)
5-5: click imported but unused
Remove unused import: click
(F401)
🤖 Prompt for AI Agents
In cognee/cli/_cognee.py around lines 2 to 5, remove the unused imports `os`,
`cast`, and `click` from the import block; update the import statement(s) to
only import typing names actually used (e.g., Any, Sequence, Dict, Type, List)
and keep argparse, so the file no longer includes unused symbols and lint
warnings are resolved.
| if debug.is_debug_enabled() and raiseable_exception: | ||
| raise raiseable_exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add exception chaining for better debugging
When re-raising exceptions, use from to maintain the exception chain for better debugging.
if debug.is_debug_enabled() and raiseable_exception:
- raise raiseable_exception
+ raise raiseable_exception from ex📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if debug.is_debug_enabled() and raiseable_exception: | |
| raise raiseable_exception | |
| if debug.is_debug_enabled() and raiseable_exception: | |
| raise raiseable_exception from ex |
🧰 Tools
🪛 Ruff (0.12.2)
164-164: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In cognee/cli/_cognee.py around lines 163 to 164, the code re-raises
raiseable_exception without chaining the original exception; modify the
except/raise logic to preserve the original exception context by raising
raiseable_exception using exception chaining (raise raiseable_exception from
<original_exception_variable>) so the traceback retains the original cause for
easier debugging.
| ) | ||
|
|
||
| def execute(self, args: argparse.Namespace) -> None: | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Validate top_k bounds (help text claims max: 100 but not enforced).
Prevent surprising behavior and align with CLI help.
def execute(self, args: argparse.Namespace) -> None:
try:
+ if args.top_k < 1 or args.top_k > 100:
+ raise CliCommandException("top-k must be between 1 and 100", error_code=2)
# Import cognee here to avoid circular imports
import cognee📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| def execute(self, args: argparse.Namespace) -> None: | |
| try: | |
| if args.top_k < 1 or args.top_k > 100: | |
| raise CliCommandException("top-k must be between 1 and 100", error_code=2) | |
| # Import cognee here to avoid circular imports | |
| import cognee |
🤖 Prompt for AI Agents
In cognee/cli/commands/search_command.py around line 86, the CLI accepts a top_k
parameter but does not enforce the help text's claimed maximum of 100; add a
validation immediately after parsing (inside the try block around line 86) that
ensures top_k is an integer within the bounds 1..100, and if it is out of range
raise a clear argument/CLI error (e.g., argparse.ArgumentTypeError or
click.BadParameter) or print an error and exit non‑zero so the user sees a
concise message like "top_k must be between 1 and 100".
| else: | ||
| raise InvalidValueError(message=f"Unsupported LLM provider: {provider}") | ||
| raise UnsupportedLLMProviderError(provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type error: passing enum instance instead of string
The UnsupportedLLMProviderError expects a string parameter, but you're passing a LLMProvider enum instance. This will result in a confusing error message like "Unsupported LLM provider: LLMProvider.UNKNOWN".
else:
- raise UnsupportedLLMProviderError(provider)
+ raise UnsupportedLLMProviderError(provider.value)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else: | |
| raise InvalidValueError(message=f"Unsupported LLM provider: {provider}") | |
| raise UnsupportedLLMProviderError(provider) | |
| else: | |
| raise UnsupportedLLMProviderError(provider.value) |
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 67-150: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
🤖 Prompt for AI Agents
In
cognee/infrastructure/llm/structured_output_framework/litellm_instructor/llm/get_llm_client.py
around lines 149-150, you're raising UnsupportedLLMProviderError with an enum
instance which yields messages like "LLMProvider.UNKNOWN"; change the argument
to a plain string by passing the enum's string value (e.g., provider.value) or
its name (provider.name) so the error message contains the actual provider
identifier; update the raise to pass provider.value (or provider.name if .value
is not a string) and ensure the error type signature still expects a string.
| if not data: | ||
| data: list[Data] = await get_dataset_data(dataset_id=dataset_id) | ||
|
|
||
| # async with update_status_lock: TODO: Add UI lock to prevent multiple backend requests | ||
| if isinstance(dataset, Dataset): | ||
| task_status = await get_pipeline_status([dataset_id], pipeline_name) | ||
| else: | ||
| task_status = [ | ||
| PipelineRunStatus.DATASET_PROCESSING_COMPLETED | ||
| ] # TODO: this is a random assignment, find permanent solution | ||
|
|
||
| if str(dataset_id) in task_status: | ||
| if task_status[str(dataset_id)] == PipelineRunStatus.DATASET_PROCESSING_STARTED: | ||
| logger.info("Dataset %s is already being processed.", dataset_id) | ||
| pipeline_run = await get_pipeline_run_by_dataset(dataset_id, pipeline_name) | ||
| yield PipelineRunStarted( | ||
| pipeline_run_id=pipeline_run.pipeline_run_id, | ||
| dataset_id=dataset.id, | ||
| dataset_name=dataset.name, | ||
| payload=data, | ||
| ) | ||
| return | ||
| elif task_status[str(dataset_id)] == PipelineRunStatus.DATASET_PROCESSING_COMPLETED: | ||
| logger.info("Dataset %s is already processed.", dataset_id) | ||
| pipeline_run = await get_pipeline_run_by_dataset(dataset_id, pipeline_name) | ||
| yield PipelineRunCompleted( | ||
| pipeline_run_id=pipeline_run.pipeline_run_id, | ||
| dataset_id=dataset.id, | ||
| dataset_name=dataset.name, | ||
| ) | ||
| return | ||
|
|
||
| if not isinstance(tasks, list): | ||
| raise ValueError("Tasks must be a list") | ||
| data: list[Data] = await get_dataset_data(dataset_id=dataset.id) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Do not treat empty list as “no data”.
if not data fetches dataset data when an empty list is intentionally passed. Check against None.
Apply this diff:
- if not data:
- data: list[Data] = await get_dataset_data(dataset_id=dataset.id)
+ if data is None:
+ data: list[Data] = await get_dataset_data(dataset_id=dataset.id)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not data: | |
| data: list[Data] = await get_dataset_data(dataset_id=dataset_id) | |
| # async with update_status_lock: TODO: Add UI lock to prevent multiple backend requests | |
| if isinstance(dataset, Dataset): | |
| task_status = await get_pipeline_status([dataset_id], pipeline_name) | |
| else: | |
| task_status = [ | |
| PipelineRunStatus.DATASET_PROCESSING_COMPLETED | |
| ] # TODO: this is a random assignment, find permanent solution | |
| if str(dataset_id) in task_status: | |
| if task_status[str(dataset_id)] == PipelineRunStatus.DATASET_PROCESSING_STARTED: | |
| logger.info("Dataset %s is already being processed.", dataset_id) | |
| pipeline_run = await get_pipeline_run_by_dataset(dataset_id, pipeline_name) | |
| yield PipelineRunStarted( | |
| pipeline_run_id=pipeline_run.pipeline_run_id, | |
| dataset_id=dataset.id, | |
| dataset_name=dataset.name, | |
| payload=data, | |
| ) | |
| return | |
| elif task_status[str(dataset_id)] == PipelineRunStatus.DATASET_PROCESSING_COMPLETED: | |
| logger.info("Dataset %s is already processed.", dataset_id) | |
| pipeline_run = await get_pipeline_run_by_dataset(dataset_id, pipeline_name) | |
| yield PipelineRunCompleted( | |
| pipeline_run_id=pipeline_run.pipeline_run_id, | |
| dataset_id=dataset.id, | |
| dataset_name=dataset.name, | |
| ) | |
| return | |
| if not isinstance(tasks, list): | |
| raise ValueError("Tasks must be a list") | |
| data: list[Data] = await get_dataset_data(dataset_id=dataset.id) | |
| if data is None: | |
| data: list[Data] = await get_dataset_data(dataset_id=dataset.id) |
🤖 Prompt for AI Agents
In cognee/modules/pipelines/operations/pipeline.py around lines 68 to 70, the
current conditional uses "if not data" which treats an intentionally passed
empty list as missing and triggers a fetch; change the check to "if data is
None" so only the absence of a value causes get_dataset_data to be awaited,
preserving empty-list inputs; update the conditional accordingly and keep the
type annotation/assignment semantics the same.
| relationships.append( | ||
| ( | ||
| source_id, | ||
| target_id_1, | ||
| relationship_name, | ||
| { | ||
| "relationship_name": relationship_name, | ||
| "source_node_id": source_id, | ||
| "target_node_id": target_id_1, | ||
| "ontology_valid": False, | ||
| "feedback_weight": 0, | ||
| }, | ||
| ) | ||
| ) | ||
|
|
||
| relationships.append( | ||
| ( | ||
| source_id, | ||
| target_id_2, | ||
| relationship_name, | ||
| { | ||
| "relationship_name": relationship_name, | ||
| "source_node_id": source_id, | ||
| "target_node_id": target_id_2, | ||
| "ontology_valid": False, | ||
| "feedback_weight": 0, | ||
| }, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Convert UUIDs to strings when persisting edges.
Graph add_edges expects string IDs; passing UUIDs may fail in adapters (e.g., Kuzu). Convert to str for from/to IDs.
Apply this diff:
- relationships.append(
+ relationships.append(
(
- source_id,
- target_id_1,
+ str(source_id),
+ str(target_id_1),
relationship_name,
{
"relationship_name": relationship_name,
- "source_node_id": source_id,
- "target_node_id": target_id_1,
+ "source_node_id": str(source_id),
+ "target_node_id": str(target_id_1),
"ontology_valid": False,
"feedback_weight": 0,
},
)
)
@@
- relationships.append(
+ relationships.append(
(
- source_id,
- target_id_2,
+ str(source_id),
+ str(target_id_2),
relationship_name,
{
"relationship_name": relationship_name,
- "source_node_id": source_id,
- "target_node_id": target_id_2,
+ "source_node_id": str(source_id),
+ "target_node_id": str(target_id_2),
"ontology_valid": False,
"feedback_weight": 0,
},
)
)Confirm adapters expect str:
🏁 Script executed:
#!/bin/bash
rg -nP 'def\s+add_edges\(' -n cognee/infrastructure/databases/graph | sed 's/:/ -> /'
rg -nP 'add_edges\([^)]*\)' -n cognee/infrastructure/databases/graph/kuzu/adapter.py -n -C2Length of output: 1116
Convert UUIDs to strings when building relationships in GraphCompletionRetriever
All add_edges methods in our adapters (Neptune, Kuzu, Neo4j, Memgraph) require their edge IDs to be strings (List[Tuple[str, str, str, …]]); passing uuid.UUID objects will break runtime behavior and type checks. Verified signatures via ripgrep in each adapter:
- cognee/infrastructure/databases/graph/neptune_driver/adapter.py (line 538):
async def add_edges(self, edges: List[Tuple[str, str, str, Optional[Dict[str, Any]]]]) - cognee/infrastructure/databases/graph/kuzu/adapter.py (line 647):
async def add_edges(self, edges: List[Tuple[str, str, str, Dict[str, Any]]]]) - cognee/infrastructure/databases/graph/graph_db_interface.py (line 277)
- cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (line 448)
- cognee/infrastructure/databases/graph/memgraph/memgraph_adapter.py (line 413)
cognee/modules/retrieval/graph_completion_retriever.py:241–269 needs updating:
--- a/cognee/modules/retrieval/graph_completion_retriever.py
+++ b/cognee/modules/retrieval/graph_completion_retriever.py
@@ -241,11 +241,11 @@
relationships.append(
- (source_id, target_id_1, relationship_name, {
- "relationship_name": relationship_name,
- "source_node_id": source_id,
- "target_node_id": target_id_1,
+ (str(source_id), str(target_id_1), relationship_name, {
+ "relationship_name": relationship_name,
+ "source_node_id": str(source_id),
+ "target_node_id": str(target_id_1),
"ontology_valid": False,
"feedback_weight": 0,
}),
@@ -255,11 +255,11 @@
relationships.append(
- (source_id, target_id_2, relationship_name, {
- "relationship_name": relationship_name,
- "source_node_id": source_id,
- "target_node_id": target_id_2,
+ (str(source_id), str(target_id_2), relationship_name, {
+ "relationship_name": relationship_name,
+ "source_node_id": str(source_id),
+ "target_node_id": str(target_id_2),
"ontology_valid": False,
"feedback_weight": 0,
}),Please apply this change to ensure compatibility with all graph adapters.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| relationships.append( | |
| ( | |
| source_id, | |
| target_id_1, | |
| relationship_name, | |
| { | |
| "relationship_name": relationship_name, | |
| "source_node_id": source_id, | |
| "target_node_id": target_id_1, | |
| "ontology_valid": False, | |
| "feedback_weight": 0, | |
| }, | |
| ) | |
| ) | |
| relationships.append( | |
| ( | |
| source_id, | |
| target_id_2, | |
| relationship_name, | |
| { | |
| "relationship_name": relationship_name, | |
| "source_node_id": source_id, | |
| "target_node_id": target_id_2, | |
| "ontology_valid": False, | |
| "feedback_weight": 0, | |
| }, | |
| ) | |
| ) | |
| relationships.append( | |
| ( | |
| str(source_id), | |
| str(target_id_1), | |
| relationship_name, | |
| { | |
| "relationship_name": relationship_name, | |
| "source_node_id": str(source_id), | |
| "target_node_id": str(target_id_1), | |
| "ontology_valid": False, | |
| "feedback_weight": 0, | |
| }, | |
| ) | |
| ) | |
| relationships.append( | |
| ( | |
| str(source_id), | |
| str(target_id_2), | |
| relationship_name, | |
| { | |
| "relationship_name": relationship_name, | |
| "source_node_id": str(source_id), | |
| "target_node_id": str(target_id_2), | |
| "ontology_valid": False, | |
| "feedback_weight": 0, | |
| }, | |
| ) | |
| ) |
🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_retriever.py around lines 241 to
269, the relationship tuples are being built with uuid.UUID objects which
adapters expect as strings; update the construction so all node IDs used in the
edge tuples are converted to strings (e.g., use str(source_id),
str(target_id_1), str(target_id_2)) before appending, keeping the rest of the
metadata the same so the edges match the adapters' List[Tuple[str, str, str,
...]] signature.
| type_counts = Counter(node_data[1].get("type", {}) for node_data in graph[0]) | ||
|
|
||
| edge_type_counts = Counter(edge_type[2] for edge_type in graph[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix Counter default for node types (dict is unhashable).
Using {} as a default can raise TypeError; default to empty string.
- type_counts = Counter(node_data[1].get("type", {}) for node_data in graph[0])
+ type_counts = Counter(node_data[1].get("type", "") for node_data in graph[0])
- edge_type_counts = Counter(edge_type[2] for edge_type in graph[1])
+ edge_type_counts = Counter(edge_type[2] for edge_type in graph[1])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| type_counts = Counter(node_data[1].get("type", {}) for node_data in graph[0]) | |
| edge_type_counts = Counter(edge_type[2] for edge_type in graph[1]) | |
| type_counts = Counter(node_data[1].get("type", "") for node_data in graph[0]) | |
| edge_type_counts = Counter(edge_type[2] for edge_type in graph[1]) |
🤖 Prompt for AI Agents
In cognee/tests/test_search_db.py around lines 164 to 166, the Counter for node
types uses get("type", {}) which can return an unhashable dict and raise
TypeError; change the default to an empty string (or another hashable sentinel)
so Counter receives only hashable values, i.e., replace the default {} with ""
when calling get("type", ...).
| @patch("cognee.cli.debug.enable_debug") | ||
| @patch("cognee.cli.echo.note") | ||
| def test_debug_action_call(self, mock_note, mock_enable_debug): | ||
| """Test that DebugAction enables debug mode""" | ||
| from cognee.cli._cognee import DebugAction | ||
|
|
||
| action = DebugAction([]) | ||
| parser = MagicMock() | ||
| namespace = MagicMock() | ||
|
|
||
| action(parser, namespace, None) | ||
|
|
||
| mock_enable_debug.assert_called_once() | ||
| mock_note.assert_called_once_with("Debug mode enabled. Full stack traces will be shown.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch the correct symbol used by DebugAction.
DebugAction calls fmt.note(...), not echo.note(...). Current patch won’t intercept the call and the test will fail.
Apply this diff:
- @patch("cognee.cli.debug.enable_debug")
- @patch("cognee.cli.echo.note")
+ @patch("cognee.cli.debug.enable_debug")
+ @patch("cognee.cli._cognee.fmt.note")
def test_debug_action_call(self, mock_note, mock_enable_debug):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @patch("cognee.cli.debug.enable_debug") | |
| @patch("cognee.cli.echo.note") | |
| def test_debug_action_call(self, mock_note, mock_enable_debug): | |
| """Test that DebugAction enables debug mode""" | |
| from cognee.cli._cognee import DebugAction | |
| action = DebugAction([]) | |
| parser = MagicMock() | |
| namespace = MagicMock() | |
| action(parser, namespace, None) | |
| mock_enable_debug.assert_called_once() | |
| mock_note.assert_called_once_with("Debug mode enabled. Full stack traces will be shown.") | |
| @patch("cognee.cli.debug.enable_debug") | |
| @patch("cognee.cli._cognee.fmt.note") | |
| def test_debug_action_call(self, mock_note, mock_enable_debug): | |
| """Test that DebugAction enables debug mode""" | |
| from cognee.cli._cognee import DebugAction | |
| action = DebugAction([]) | |
| parser = MagicMock() | |
| namespace = MagicMock() | |
| action(parser, namespace, None) | |
| mock_enable_debug.assert_called_once() | |
| mock_note.assert_called_once_with("Debug mode enabled. Full stack traces will be shown.") |
🤖 Prompt for AI Agents
In cognee/tests/unit/cli/test_cli_main.py around lines 160 to 173, the test
patches cognee.cli.echo.note but DebugAction actually calls fmt.note; change the
test to patch "cognee.cli.fmt.note" instead of "cognee.cli.echo.note" so the
call is intercepted. Keep the rest of the test (patching enable_debug and
assertions) unchanged.
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.