Skip to content

Conversation

@dexters1
Copy link
Collaborator

Description

Fix Gemini github action

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

@dexters1 dexters1 self-assigned this Apr 16, 2025
@pull-checklist
Copy link

Please make sure all the checkboxes are checked:

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have added end-to-end and unit tests (if applicable).
  • I have updated the documentation and README.md file (if necessary).
  • I have removed unnecessary code and debug statements.
  • PR title is clear and follows the convention.
  • I have tagged reviewers or team members for feedback.

@dexters1 dexters1 changed the base branch from main to dev April 16, 2025 12:49
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 16, 2025

Walkthrough

This update introduces substantial enhancements and refactoring across the codebase. Major changes include the introduction of robust rate limiting and retry decorators for both LLM and embedding API calls, with new configuration options and singleton limiter classes. The pipeline task execution system is refactored, replacing the old Task class and pipeline runner with improved, modular implementations. Several new prompt templates are added for knowledge graph extraction and minimal answer generation. The user management system now supports configurable default user credentials via environment variables. Numerous new test modules are introduced, covering rate limiting, embedding, telemetry, and database migration. Additionally, new evaluation scripts and documentation are added, while obsolete notebooks and example scripts are removed.

Changes

Files / Paths Change Summary
.env.template, cognee/base_config.py, cognee/modules/users/methods/create_default_user.py, cognee/modules/users/methods/get_default_user.py Added environment variables and config fields for default user email/password; user methods now use config values.
.github/README_WORKFLOW_MIGRATION.md, .github/workflows/disable_independent_workflows.sh Added documentation and script for migrating test workflows to centralized execution.
cognee/api/v1/cognify/code_graph_pipeline.py, cognee/api/v1/cognify/cognify.py, cognee/eval_framework/corpus_builder/corpus_builder_executor.py, cognee/eval_framework/corpus_builder/task_getters/TaskGetters.py, cognee/eval_framework/corpus_builder/task_getters/get_cascade_graph_tasks.py, cognee/modules/pipelines/__init__.py, cognee/modules/pipelines/operations/run_parallel.py, cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py, cognee/tests/integration/run_toy_tasks/run_tasks_test.py, examples/python/pokemon_datapoints_example.py Fixed imports to use lowercase task module for Task class.
cognee/api/v1/config/config.py, cognee/infrastructure/databases/relational/config.py Added migration DB config setter and updated type hints for migration config fields.
cognee/eval_framework/corpus_builder/run_corpus_builder.py Reformatted function signature for clarity; no logic changes.
cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py Expanded parameters and logic for task construction; now builds graph/data point tasks explicitly.
cognee/eval_framework/evaluation/deep_eval_adapter.py Changed logic to set retrieval_context only if golden_context exists.
cognee/infrastructure/databases/vector/embeddings/LiteLLMEmbeddingEngine.py, cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py Refactored to use new async rate limiting and retry decorators; removed manual retry logic.
cognee/infrastructure/databases/vector/embeddings/get_embedding_engine.py Passes endpoint config to Ollama embedding engine.
cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py Updates data points in DB if they exist; prevents duplicates.
cognee/infrastructure/llm/anthropic/adapter.py, cognee/infrastructure/llm/gemini/adapter.py, cognee/infrastructure/llm/generic_llm_api/adapter.py, cognee/infrastructure/llm/ollama/adapter.py, cognee/infrastructure/llm/openai/adapter.py Added rate limiting and retry decorators to LLM adapter methods.
cognee/infrastructure/llm/config.py Added config fields for LLM/embedding rate limiting and prompt path; updated serialization.
cognee/infrastructure/llm/embedding_rate_limiter.py, cognee/infrastructure/llm/rate_limiter.py New modules implementing singleton-based rate limiting and retry logic for embedding and LLM APIs.
cognee/infrastructure/llm/prompts/answer_simple_question_benchmark2.txt, cognee/infrastructure/llm/prompts/answer_simple_question_benchmark3.txt, cognee/infrastructure/llm/prompts/answer_simple_question_benchmark4.txt, cognee/infrastructure/llm/prompts/generate_graph_prompt_guided.txt, cognee/infrastructure/llm/prompts/generate_graph_prompt_oneshot.txt, cognee/infrastructure/llm/prompts/generate_graph_prompt_simple.txt, cognee/infrastructure/llm/prompts/generate_graph_prompt_strict.txt Added new prompt templates for minimal answer and knowledge graph extraction.
cognee/modules/data/extraction/knowledge_graph/extract_content_graph.py Uses dynamic prompt path from config for system prompt rendering.
cognee/modules/pipelines/operations/run_tasks.py, cognee/modules/pipelines/operations/run_tasks_base.py Refactored: moved core pipeline execution logic to new module; imports new run_tasks_base.
cognee/modules/pipelines/tasks/Task.py Deleted old Task class implementation.
cognee/modules/pipelines/tasks/task.py Added new Task class supporting async, sync, generator, and batching execution.
cognee/modules/search/methods/search.py, cognee/modules/search/types/SearchType.py, cognee-mcp/src/server.py, cognee/tests/test_custom_model.py Renamed COMPLETION to RAG_COMPLETION in search types and updated usage.
cognee/shared/logging_utils.py Added attribute check before assigning exception type in logging.
cognee/tasks/graph/extract_graph_from_data.py Added Optional import; no logic changes.
cognee/tasks/ingestion/migrate_relational_database.py Added logic to remove duplicate edges before adding to graph DB.
cognee/tests/integration/run_toy_tasks/run_task_from_queue_test.py, cognee/tests/integration/run_toy_tasks/run_tasks_test.py Updated test functions to match new task argument/result structure.
cognee/tests/test_cognee_server_start.py, cognee/tests/test_relational_db_migration.py, cognee/tests/test_telemetry.py Added new integration and unit tests for server startup, DB migration, and telemetry.
cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py Disabled a failing test with a TODO comment.
cognee/tests/unit/infrastructure/databases/test_rate_limiter.py, cognee/tests/unit/infrastructure/mock_embedding_engine.py, cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py, cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py, cognee/tests/unit/infrastructure/test_rate_limiting_retry.py Added comprehensive unit tests for rate limiting and embedding retry logic, including mock engines and realistic scenarios.
evals/README.md, evals/plot_metrics.py, evals/requirements.txt, evals/falkor_01042025/hotpot_qa_falkor_graphrag_sdk.py, evals/graphiti_01042025/hotpot_qa_graphiti.py, evals/mem0_01042025/hotpot_qa_mem0.py Added evaluation scripts, documentation, and plotting utilities for QA system benchmarking.
notebooks/cognee_simple_demo.ipynb Updated to install cognee version 0.1.36.
examples/node/fetch.js, examples/node/handleServerErrors.js, examples/node/main.js, evals/eval_swe_bench.py, evals/eval_utils.py, evals/test_datasets/initial_test/natural_language_processing.txt, evals/test_datasets/initial_test/trump.txt, notebooks/cognee_code_graph_demo.ipynb, notebooks/cognee_hotpot_eval.ipynb, notebooks/hr_demo.ipynb, notebooks/pokemon_datapoints_notebook.ipynb, cognee/modules/pipelines/tasks/Task.py Deleted obsolete scripts, notebooks, and data files.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant API
    participant LLMAdapter
    participant RateLimiter
    participant EmbeddingEngine
    participant EmbeddingRateLimiter

    User->>API: Submit LLM or Embedding request
    API->>LLMAdapter: Call LLM method (decorated)
    LLMAdapter->>RateLimiter: Check/wait rate limit
    RateLimiter-->>LLMAdapter: Allow or delay
    LLMAdapter->>LLMAdapter: Retry on rate limit error (decorator)
    LLMAdapter-->>API: Return LLM result

    API->>EmbeddingEngine: Call embed_text (decorated)
    EmbeddingEngine->>EmbeddingRateLimiter: Check/wait rate limit
    EmbeddingRateLimiter-->>EmbeddingEngine: Allow or delay
    EmbeddingEngine->>EmbeddingEngine: Retry on rate limit error (decorator)
    EmbeddingEngine-->>API: Return embeddings
Loading
sequenceDiagram
    participant User
    participant Pipeline
    participant Task
    participant Telemetry

    User->>Pipeline: Start pipeline with tasks
    Pipeline->>Task: Execute (async, sync, generator, etc.)
    Task->>Telemetry: Send start event
    Task->>Task: Run task logic (yield results)
    Task->>Telemetry: Send complete/error event
    Task-->>Pipeline: Yield results (batched if needed)
    Pipeline->>Task: Execute next task with results
    Pipeline-->>User: Stream results
Loading

Possibly related PRs

  • topoteretes/cognee#707: Directly modifies the same exception_handler logic in cognee/shared/logging_utils.py for exception type assignment.
  • topoteretes/cognee#593: Related to user configuration and authorization, overlapping with new default user config changes.
  • topoteretes/cognee#682: Adds the same default user environment variables and config fields, directly related to this PR's user management improvements.

Poem

A rabbit hops through code anew,
With rate limits strong and retries too!
Pipelines flow, tasks now refined,
User defaults no longer hard-defined.
Prompts for graphs and answers bright,
Tests and docs to guide the night.
🐇 Cheers to change—our code takes flight!

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py (1)

227-227: ⚠️ Potential issue

Potential inconsistency with disabled money extraction test

This line asserts that "MONEY" entity type is present in the extracted entities, but the specific test for money extraction is currently disabled because the regex is failing.

Either:

  1. Fix the money regex issue
  2. Temporarily remove this assertion
  3. Verify if this test is somehow still passing with the current implementation

You might want to run this test in isolation to check if it's actually passing or if it's being skipped:

pytest cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py::test_extract_multiple_entity_types -v
🧹 Nitpick comments (50)
cognee/tasks/graph/extract_graph_from_data.py (1)

2-2: Remove unused import

The Optional type is imported but not used anywhere in this file.

-from typing import Type, List, Optional
+from typing import Type, List
🧰 Tools
🪛 Ruff (0.8.2)

2-2: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

cognee/base_config.py (1)

17-18: Good addition of configurable default user credentials

Adding environment variable-based configuration for default user credentials makes the system more flexible and easier to configure across different environments.

Consider updating the to_dict() method to include these new fields if they should be part of the serialized configuration (or explicitly document why they're excluded if that's intentional).

.env.template (1)

4-6: Environment variables for default user configuration look good

The template provides empty placeholders for the new default user configuration variables, which is consistent with other sections of the template.

You might consider adding a brief comment explaining the purpose of these variables and whether they're required or optional.

cognee/modules/data/extraction/knowledge_graph/extract_content_graph.py (1)

1-1: Remove unused import.

The Optional type is imported but not used anywhere in the file.

-from typing import Type, Optional
+from typing import Type
🧰 Tools
🪛 Ruff (0.8.2)

1-1: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

cognee/eval_framework/corpus_builder/run_corpus_builder.py (1)

3-3: Remove unused import.

The Optional type is imported but not used anywhere in the file.

-from typing import List, Optional
+from typing import List
🧰 Tools
🪛 Ruff (0.8.2)

3-3: typing.Optional imported but unused

Remove unused import: typing.Optional

(F401)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

130-137: Consider batching queries to reduce overhead.
You perform an individual query for each data point ID. For large input lists, this results in multiple synchronous calls and can degrade performance. An alternative is to retrieve all matching records at once, then update or insert them as needed.

cognee/tests/test_cognee_server_start.py (1)

14-28: Good server setup approach with improvement opportunities.

The server setup in a separate process is a good approach for integration testing. However, consider two improvements:

  1. For testing environments, binding to 127.0.0.1 instead of 0.0.0.0 would be more secure.
  2. The preexec_fn=os.setsid properly sets up a process group for clean termination.
-                "--host",
-                "0.0.0.0",
+                "--host",
+                "127.0.0.1",
cognee/tests/unit/infrastructure/mock_embedding_engine.py (1)

37-55: Well-implemented embedding method with configurable behavior.

The implementation correctly applies the rate limiting and retry decorators while supporting configurable delays and failures for resilience testing. One minor improvement would be to add slight variability to the embeddings to better simulate real-world conditions.

Consider adding some randomness to the generated embeddings:

-        # Return mock embeddings of the correct dimension
-        return [[0.1] * self.dimensions for _ in text]
+        # Return mock embeddings with slight randomness for more realistic testing
+        import random
+        return [
+            [0.1 + (random.random() - 0.5) * 0.02 for _ in range(self.dimensions)]
+            for _ in text
+        ]
cognee/infrastructure/llm/prompts/generate_graph_prompt_guided.txt (1)

1-77: Comprehensive and well-structured knowledge graph extraction prompt.

The prompt provides detailed guidelines with excellent organization into sections (Node Guidelines, Property & Data Guidelines, etc.). The instructions are specific and include clear examples for proper formatting of node IDs, properties, dates, and relationships.

Consider adding a complete example of the expected output format at the end of the prompt to provide a clearer template for the LLM to follow. For example:

**Expected Output Format Example**:

```json
{
  "nodes": [
    {
      "id": "Marie Curie",
      "label": "Person",
      "properties": {
        "birth_date": "1867-11-07",
        "birth_place": "Warsaw",
        "field": "Physics, Chemistry"
      }
    },
    {
      "id": "Radioactivity",
      "label": "Concept",
      "properties": {
        "discovery_date": "1896"
      }
    }
  ],
  "edges": [
    {
      "source": "Marie Curie",
      "target": "Radioactivity",
      "label": "researched"
    },
    {
      "source": "Radioactivity",
      "target": "Marie Curie",
      "label": "discovered_by"
    }
  ]
}

</blockquote></details>
<details>
<summary>cognee/infrastructure/llm/prompts/generate_graph_prompt_simple.txt (1)</summary><blockquote>

`1-28`: **Concise and clear knowledge graph extraction prompt.**

This simplified prompt maintains the essential guidelines while being more concise. The rules are clear and emphasize the most critical aspects of knowledge graph construction. The consistency with the more detailed guided prompt is good for maintaining standardized output formats.


Consider these two enhancements:

1. Add more detail on edge directionality to ensure proper semantic relationships:
```diff
+   - Edges must have logical direction: from source to target
+   - Example: "Marie Curie" —[born_in]→ "Warsaw" (not "Warsaw" —[birth_place_of]→ "Marie Curie")
  1. Add a concrete example of the complete expected output format to provide clearer guidance:
+**Example Output**:
+```json
+{
+  "nodes": [
+    {"id": "Albert Einstein", "label": "Person", "properties": {"birth_date": "1879-03-14"}},
+    {"id": "Theory of Relativity", "label": "Concept", "properties": {}}
+  ],
+  "edges": [
+    {"source": "Albert Einstein", "target": "Theory of Relativity", "label": "developed"}
+  ]
+}
+```
cognee/tests/unit/infrastructure/databases/test_rate_limiter.py (1)

4-5: Remove unused import

The time module is imported but not used in this file.

import asyncio
-import time
from unittest.mock import patch
🧰 Tools
🪛 Ruff (0.8.2)

4-4: asyncio imported but unused

Remove unused import: asyncio

(F401)


5-5: time imported but unused

Remove unused import: time

(F401)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

61-80: Good refactoring of retry logic using decorators

The addition of the @embedding_sleep_and_retry_async() decorator and simplification of the _get_embedding method is a good refactoring that centralizes retry logic and improves code maintainability.

Consider combining the nested with statements for better readability:

-        async with aiohttp.ClientSession() as session:
-            async with session.post(
-                self.endpoint, json=payload, headers=headers, timeout=60.0
-            ) as response:
-                data = await response.json()
-                return data["embedding"]
+        async with aiohttp.ClientSession() as session, session.post(
+            self.endpoint, json=payload, headers=headers, timeout=60.0
+        ) as response:
+            data = await response.json()
+            return data["embedding"]
🧰 Tools
🪛 Ruff (0.8.2)

75-78: Use a single with statement with multiple contexts instead of nested with statements

Combine with statements

(SIM117)

evals/README.md (1)

1-93: Comprehensive evaluation documentation

This README provides clear instructions for reproducing the evaluation process and interpreting results. The documentation is well-structured with setup instructions, execution steps, and limitations of the approach.

Minor style improvements:

  • Line 71: Consider changing "In order to ensure" to "To ensure" for conciseness
  • Line 87: "labor-intensive" should be hyphenated
🧰 Tools
🪛 LanguageTool

[style] ~70-~70: Consider a shorter alternative to avoid wordiness.
Context: ...ect_llm.json` #### Human Evaluation In order to ensure the highest possible accuracy of...

(IN_ORDER_TO_PREMIUM)


[misspelling] ~87-~87: This word is normally spelled with a hyphen.
Context: ...memory evaluation - Human as a judge is labor intensive and does not scale - Hotpot is not the ...

(EN_COMPOUNDS_LABOR_INTENSIVE)

cognee/modules/pipelines/tasks/task.py (1)

52-66: Consider exception handling or logging in execute_async_generator.

In production scenarios, one failing iteration could disrupt the entire async generator. Consider adding a try/except block and a logging statement to capture errors gracefully (or escalate them), ensuring partial results are not silently discarded.

cognee/tests/unit/infrastructure/test_embedding_rate_limiting_realistic.py (5)

4-4: Remove unused import.

lru_cache is imported but unused.

-from functools import lru_cache
🧰 Tools
🪛 Ruff (0.8.2)

4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)


7-7: Remove unused import.

LLMConfig is imported but unused.

-from cognee.infrastructure.llm.config import LLMConfig, get_llm_config
+from cognee.infrastructure.llm.config import get_llm_config
🧰 Tools
🪛 Ruff (0.8.2)

7-7: cognee.infrastructure.llm.config.LLMConfig imported but unused

Remove unused import: cognee.infrastructure.llm.config.LLMConfig

(F401)


10-10: Remove unused import.

LiteLLMEmbeddingEngine is imported but never referenced.

-from cognee.infrastructure.databases.vector.embeddings.LiteLLMEmbeddingEngine import (
-    LiteLLMEmbeddingEngine,
-)
🧰 Tools
🪛 Ruff (0.8.2)

10-10: cognee.infrastructure.databases.vector.embeddings.LiteLLMEmbeddingEngine.LiteLLMEmbeddingEngine imported but unused

Remove unused import: cognee.infrastructure.databases.vector.embeddings.LiteLLMEmbeddingEngine.LiteLLMEmbeddingEngine

(F401)


28-32: Prefer pytest fixtures to manage environment variables.

Manually setting and popping environment variables could cause side effects if tests run in parallel. Using a pytest fixture to wrap environment changes can avoid shared state issues.


66-67: Optionally differentiate rate-limited vs. other errors.

Catching all exceptions as rate-limited might obscure genuine errors. Logging or re-raising non-rate-limit exceptions could improve debugging.

cognee/eval_framework/corpus_builder/task_getters/get_default_tasks_by_indices.py (2)

28-28: Remove or leverage the user parameter.

user is declared but never used in these functions. Either remove it if unnecessary or incorporate it into task creation logic if it will be used in future expansions.

Also applies to: 51-52


36-47: Ensure consistent naming for large batch tasks.

graph_task and add_data_points_task each use {"batch_size": 10}. Consider extracting this into a shared constant or clarifying it via a docstring to maintain consistency across all tasks.

cognee/modules/pipelines/operations/run_tasks_base.py (1)

1-1: Remove unused inspect import.

The inspect module is never referenced in this file, so consider removing it to maintain a clean import list and avoid confusion.

- import inspect
  from cognee.shared.logging_utils import get_logger
🧰 Tools
🪛 Ruff (0.8.2)

1-1: inspect imported but unused

Remove unused import: inspect

(F401)

cognee/tests/unit/infrastructure/test_rate_limiting_realistic.py (1)

2-4: Remove unused imports time and lru_cache.

Both imports are never referenced. Removing them helps keep the code clean and free of dead references.

- import time
- from functools import lru_cache
  import os
🧰 Tools
🪛 Ruff (0.8.2)

2-2: time imported but unused

Remove unused import: time

(F401)


4-4: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)

cognee/tests/test_relational_db_migration.py (9)

29-38: Ensure consistent naming convention in setup function.

The function setup_test_db is clear, but the returned variable name (migration_engine) and the function name might not be fully aligned semantically. Consider renaming the function or return value to reflect its role more consistently, e.g., prepare_migration_db.


40-45: Validate schema extraction.

relational_db_migration extracts the schema and immediately migrates it. Consider verifying that the schema isn't empty before proceeding, to handle unexpected or empty relational database states gracefully.


56-61: Prefer a direct relationship assignment.

Instead of branching on postgresql vs. "other" for relationship_label, consider standardizing the label or referencing a dictionary for clarity:

labels = {
    "postgresql": "reports_to",
    "default": "ReportsTo"
}
relationship_label = labels.get(migration_db_provider, labels["default"])

65-67: Initialize sets outside the if-block for clarity.

distinct_node_names and found_edges are used in all subsequent blocks. Keeping them initialized up front is clear, but consider explaining their purpose with a short descriptive comment.


108-119: Handle unsupported providers more gracefully.

Raising a ValueError here is fine, but consider providing recovery steps or suggestions, or a clearer error message, e.g., "Unsupported graph database: {graph_db_provider}. Please specify one of: [neo4j, kuzu, networkx].".

🧰 Tools
🪛 Ruff (0.8.2)

111-111: Loop control variable edge_data not used within loop body

Rename unused edge_data to _edge_data

(B007)


111-111: Rename the unused loop variable to underscore.

According to the static analysis hint (B007), the variable edge_data is not used. Renaming it to _edge_data (or simply _) clarifies that it is intentionally unused:

- for src, tgt, key, edge_data in edges:
+ for src, tgt, key, _ in edges:
🧰 Tools
🪛 Ruff (0.8.2)

111-111: Loop control variable edge_data not used within loop body

Rename unused edge_data to _edge_data

(B007)


121-127: Consider documenting node/edge expectations.

The assertion for 8 nodes and 7 edges is strongly tied to specific test data. A docstring or inline comment clarifying the test data’s structure (Employee IDs, etc.) would help future maintainers follow the logic.


165-169: Clarify difference in DB sizes.

The note comments mention that Postgres and SQLite data differ. Consider referencing the relevant script or steps for generating each dataset so new contributors know why the counts differ.


221-233: Document dependency on external scripts.

Here you describe that we must run Chinook_PostgreSql.sql. Provide a link or instruction for quickly enabling the environment (e.g., psql -U <user> -f <path>). This helps others replicate the test environment easily.

cognee/tests/test_telemetry.py (4)

19-31: Prevent concurrency issues with .anon_id creation.

Creating or modifying .anon_id directly in the test might be problematic if multiple tests run in parallel or if the file is present from a previous run. Consider isolating test files in a temporary directory or mocking file I/O to avoid conflicts.


38-44: Be cautious with environment variable manipulation.

The test temporarily deletes or sets ENV and TELEMETRY_DISABLED. This can cause side effects in parallel tests. In a larger suite, prefer context managers or a fixture-based approach to isolate environment variable changes.


86-97: Clean up environment variable settings systematically.

It's good that you clean up TELEMETRY_DISABLED. For consistency and safety, consider using a try/finally block or a dedicated fixture to avoid leftover environment variables if an exception is raised mid-test.


99-119: Use environment mocking for dev/test checks.

Setting ENV = "dev" in code might mask local environment settings. A more robust approach is to patch os.environ or a config object, verifying that telemetry is off in dev. This helps ensure test isolation.

evals/falkor_01042025/hotpot_qa_falkor_graphrag_sdk.py (5)

5-5: Remove unused import.

URL from graphrag_sdk.source is imported but not used. Remove it to keep the imports clean and avoid confusion.

Apply this diff to remove the unused import:

- from graphrag_sdk.source import URL, STRING
+ from graphrag_sdk.source import STRING
🧰 Tools
🪛 Ruff (0.8.2)

5-5: graphrag_sdk.source.URL imported but unused

Remove unused import: graphrag_sdk.source.URL

(F401)


35-63: Consider partial ontology creation.

If ontology.from_sources fails mid-way (e.g., a batch yields errors), the entire run is aborted. As a future enhancement, you might handle partial failure or data skipping to salvage partial ontology generation.


66-119: Implement concurrency checks or locks for graph recreation.

If multiple processes or tests attempt to recreate the same graph concurrently, it may lead to inconsistent states. Consider implementing locking or using a safer transaction-based approach, especially for production.


123-159: Provide fallback answers.

When an exception occurs in chat.send_message, you return a generic error response. Consider logging the exception details more thoroughly or providing fallback logic (e.g., partial facts from the knowledge graph) for a more user-friendly experience.


161-199: Validate file existence early.

You check if not os.path.exists(config.ontology_file) to create a new ontology. If the corpus file is also missing or invalid, or if the user inadvertently sets ontology_file incorrectly, the pipeline might fail in a less obvious location. Consider verifying all required paths up front.

evals/graphiti_01042025/hotpot_qa_graphiti.py (2)

8-9: Remove the unused 'OpenAI' import.
When the code is not utilized, removing it helps maintain code cleanliness.

Use this diff:

 from langchain_openai import ChatOpenAI
-from openai import OpenAI
 from tqdm import tqdm
🧰 Tools
🪛 Ruff (0.8.2)

9-9: openai.OpenAI imported but unused

Remove unused import: openai.OpenAI

(F401)


93-95: Consider adding error handling around LLM invocation.
Wrapping the call in a try/except block provides graceful fallbacks or logs in production environments.

Here's an example approach:

         # Get answer from LLM
-        response = await llm.ainvoke(messages)
-        answer = response.content
+        try:
+            response = await llm.ainvoke(messages)
+            answer = response.content
+        except Exception as e:
+            logger.error(f"LLM call failed: {str(e)}")
+            answer = "Error: LLM call failed"
cognee/infrastructure/llm/rate_limiter.py (2)

53-57: Remove unused imports.
Ruff flagged that these imports are unused. Removing them simplifies maintenance.

-import threading
-import logging
-import functools
-import openai
-import os
🧰 Tools
🪛 Ruff (0.8.2)

53-53: threading imported but unused

Remove unused import: threading

(F401)


54-54: logging imported but unused

Remove unused import: logging

(F401)


55-55: functools imported but unused

Remove unused import: functools

(F401)


56-56: openai imported but unused

Remove unused import: openai

(F401)


57-57: os imported but unused

Remove unused import: os

(F401)


133-152: Consider renaming for clarity.
Since the method returns True when the request is allowed, hit_limit may be misleading. A more descriptive name can reduce confusion.

cognee/infrastructure/llm/embedding_rate_limiter.py (1)

39-49: Docstring mismatch regarding the limits library.
The docstring mentions using the limits library, but the implementation appears manual. Consider updating the docstring or using a consistent approach.

cognee/tests/unit/infrastructure/test_rate_limiting_retry.py (2)

3-5: Remove unused or duplicate imports.

The patch, MagicMock, and lru_cache imports are never used, and os is imported twice (lines 3 and 381). Consider removing the unused imports and consolidating the os import at the top.

Apply this diff to remove and consolidate imports:

-import os
-import time
-import unittest.mock as patch, MagicMock
-import functools.lru_cache as lru_cache
+import time
+import os

...

-import os

Also applies to: 381-381

🧰 Tools
🪛 Ruff (0.8.2)

3-3: os imported but unused

Remove unused import: os

(F401)


4-4: unittest.mock.patch imported but unused

Remove unused import

(F401)


4-4: unittest.mock.MagicMock imported but unused

Remove unused import

(F401)


5-5: functools.lru_cache imported but unused

Remove unused import: functools.lru_cache

(F401)


56-66: Prefer using testing framework output over print statements.

While printing test status can be helpful, relying on standard unittest or pytest assertions and reporting provides more structured test outputs and better integration with CI/CD.

evals/mem0_01042025/hotpot_qa_mem0.py (1)

28-28: Rename the unused underscore variable.

You are not using the loop index i in this block. Renaming it to _ or _doc_idx clarifies that it's unused.

-for i, document in enumerate(tqdm(corpus, desc="Adding documents")):
+for _, document in enumerate(tqdm(corpus, desc="Adding documents")):
🧰 Tools
🪛 Ruff (0.8.2)

28-28: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

evals/plot_metrics.py (1)

210-210: Use dictionary iteration directly instead of calling .keys().

Python allows iterating directly over dictionaries without explicitly calling .keys(). This improves readability and aligns with best practices.

-for system in all_systems_metrics.keys():
+for system in all_systems_metrics:
🧰 Tools
🪛 Ruff (0.8.2)

210-210: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🛑 Comments failed to post (5)
.github/workflows/disable_independent_workflows.sh (1)

4-4: ⚠️ Potential issue

Add error handling after cd command.

To prevent accidental destructive operations if the directory does not exist, append || exit 1 after the cd command as recommended by Shellcheck.

-cd "$(dirname "$0")"
+cd "$(dirname "$0")" || exit 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

cd "$(dirname "$0")" || exit 1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 4-4: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

cognee/tests/unit/entity_extraction/regex_entity_extraction_test.py (1)

84-97: ⚠️ Potential issue

Temporarily disabled test needs follow-up action

The test for money entity extraction has been commented out and replaced with a simple pass statement. While the TODO comment clearly explains that the regex is failing and needs to be fixed, it would be better to:

  1. Add a more specific tracking mechanism (like a JIRA ticket number) to ensure this doesn't get forgotten
  2. Consider using pytest's built-in skip or xfail mechanisms instead of commenting out the test logic

Consider replacing the current implementation with a proper pytest skip:

@pytest.mark.asyncio
async def test_extract_money(regex_extractor):
    """Test extraction of monetary amounts."""
-    # TODO: Lazar to fix regex for test, it's failing currently
-    pass
-    # text = "The product costs $1,299.99 or €1.045,00 depending on your region."
-    # entities = await regex_extractor.extract_entities(text)
-
-    # Filter only MONEY entities
-    # money_entities = [e for e in entities if e.is_a.name == "MONEY"]
-
-    # assert len(money_entities) == 2
-    # assert "$1,299.99" in [e.name for e in money_entities]
-    # assert "€1.045,00" in [e.name for e in money_entities]
+    pytest.skip("TODO: Lazar to fix regex for test, it's failing currently")
+    text = "The product costs $1,299.99 or €1.045,00 depending on your region."
+    entities = await regex_extractor.extract_entities(text)
+
+    # Filter only MONEY entities
+    money_entities = [e for e in entities if e.is_a.name == "MONEY"]
+
+    assert len(money_entities) == 2
+    assert "$1,299.99" in [e.name for e in money_entities]
+    assert "€1.045,00" in [e.name for e in money_entities]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

@pytest.mark.asyncio
async def test_extract_money(regex_extractor):
    """Test extraction of monetary amounts."""
    pytest.skip("TODO: Lazar to fix regex for test, it's failing currently")
    text = "The product costs $1,299.99 or €1.045,00 depending on your region."
    entities = await regex_extractor.extract_entities(text)

    # Filter only MONEY entities
    money_entities = [e for e in entities if e.is_a.name == "MONEY"]

    assert len(money_entities) == 2
    assert "$1,299.99" in [e.name for e in money_entities]
    assert "€1.045,00" in [e.name for e in money_entities]
cognee/shared/logging_utils.py (1)

216-217: ⚠️ Potential issue

Syntax error in attribute check

There's a syntax error in the attribute check. The __name__ is passed as a direct reference rather than as a string.

-            if hasattr(exc_type, __name__):
+            if hasattr(exc_type, "__name__"):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            if hasattr(exc_type, "__name__"):
                event_dict["exception_type"] = exc_type.__name__
cognee/tests/test_cognee_server_start.py (1)

29-30: 🛠️ Refactor suggestion

Replace fixed sleep with polling for better test reliability.

Using a fixed 20-second sleep can make tests brittle and slow - too short for some environments and unnecessarily long for others.

Consider implementing a polling approach that checks if the server is ready:

-        # Give the server some time to start
-        time.sleep(20)
+        # Poll until server is ready or timeout occurs
+        max_wait = 30  # Maximum wait time in seconds
+        start_time = time.time()
+        server_ready = False
+        
+        while not server_ready and time.time() - start_time < max_wait:
+            try:
+                response = requests.get("http://localhost:8000/health", timeout=1)
+                if response.status_code == 200:
+                    server_ready = True
+                    break
+            except requests.RequestException:
+                pass
+            time.sleep(0.5)
+        
+        if not server_ready:
+            stderr = cls.server_process.stderr.read().decode("utf-8")
+            print(f"Server failed to start within {max_wait} seconds: {stderr}", file=sys.stderr)
+            raise TimeoutError(f"Server failed to start within {max_wait} seconds")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        # Poll until server is ready or timeout occurs
        max_wait = 30  # Maximum wait time in seconds
        start_time = time.time()
        server_ready = False
        
        while not server_ready and time.time() - start_time < max_wait:
            try:
                response = requests.get("http://localhost:8000/health", timeout=1)
                if response.status_code == 200:
                    server_ready = True
                    break
            except requests.RequestException:
                pass
            time.sleep(0.5)
        
        if not server_ready:
            stderr = cls.server_process.stderr.read().decode("utf-8")
            print(f"Server failed to start within {max_wait} seconds: {stderr}", file=sys.stderr)
            raise TimeoutError(f"Server failed to start within {max_wait} seconds")
cognee/modules/pipelines/tasks/task.py (1)

20-45: 🛠️ Refactor suggestion

Add validation for task_config and _next_batch_size.

Although the constructor sets default values, there's no check to ensure task_config["batch_size"] is a positive integer. A guard for negative or zero values could help prevent faulty usage and runtime errors.

You can extend the constructor to validate batch_size, for example:

 if task_config is not None:
     self.task_config = task_config

+    if "batch_size" in task_config:
+        if not isinstance(task_config["batch_size"], int) or task_config["batch_size"] < 1:
+            raise ValueError("task_config['batch_size'] must be a positive integer")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def __init__(self, executable, *args, task_config=None, **kwargs):
        self.executable = executable
        self.default_params = {"args": args, "kwargs": kwargs}

        if inspect.isasyncgenfunction(executable):
            self.task_type = "Async Generator"
            self._execute_method = self.execute_async_generator
        elif inspect.isgeneratorfunction(executable):
            self.task_type = "Generator"
            self._execute_method = self.execute_generator
        elif inspect.iscoroutinefunction(executable):
            self.task_type = "Coroutine"
            self._execute_method = self.execute_coroutine
        elif inspect.isfunction(executable):
            self.task_type = "Function"
            self._execute_method = self.execute_function
        else:
            raise ValueError(f"Unsupported task type: {executable}")

        if task_config is not None:
            self.task_config = task_config

            if "batch_size" in task_config:
                if not isinstance(task_config["batch_size"], int) or task_config["batch_size"] < 1:
                    raise ValueError("task_config['batch_size'] must be a positive integer")

            if "batch_size" not in task_config:
                self.task_config["batch_size"] = 1

    def run(self, *args, **kwargs):

@Vasilije1990 Vasilije1990 self-requested a review April 16, 2025 13:21
@Vasilije1990 Vasilije1990 merged commit acd7abb into dev Apr 16, 2025
1 of 2 checks passed
@Vasilije1990 Vasilije1990 deleted the fix-gemini-gh-action branch April 16, 2025 13:22
@coderabbitai coderabbitai bot mentioned this pull request Jul 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants