feat: added label field to DataItem dataclass #1778

mohitk-patwari · 2025-11-11T19:05:03Z

Description

In order to facilitate flexible tagging and classification of dataset entries within the Cognee AI pipeline, this PR adds an optional label field to the DataItem dataclass.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Code refactoring
Performance improvement
Other (please specify):

Screenshots/Videos (if applicable)

N/A - this change affects only the backend logics only

Pre-submission Checklist

I have tested my changes thoroughly before submitting this PR
This PR contains minimal changes necessary to address the issue/feature
My code follows the project's coding standards and style guidelines
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if applicable)
All new and existing tests pass
I have searched existing PRs to ensure this change hasn't been submitted already
I have linked any relevant issues in the description
My commits have clear and descriptive messages

DCO Affirmation

I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.

pull-checklist · 2025-11-11T19:05:10Z

Please make sure all the checkboxes are checked:

I have tested these changes locally.
I have reviewed the code changes.
I have added end-to-end and unit tests (if applicable).
I have updated the documentation and README.md file (if necessary).
I have removed unnecessary code and debug statements.
PR title is clear and follows the convention.
I have tagged reviewers or team members for feedback.

coderabbitai · 2025-11-11T19:05:20Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

A multi-faceted update introducing backend access control configuration, refactoring the retrieval system to support structured outputs via response_model parameter, adding session persistence pipelines, implementing chunking with overlap, extending Edge metadata with edge_text, and updating numerous tests and examples to align with these changes.

Changes

Cohort / File(s)	Change Summary
Configuration & Environment `.env.template`, `entrypoint.sh`	ENABLE_BACKEND_ACCESS_CONTROL default changed from False to True; gunicorn/debugpy invocations updated to use exec and enhanced logging (--access-logfile, --error-logfile).
Backend Access Control Infrastructure `cognee/context_global_variables.py`, `cognee/api/client.py`, `cognee/modules/search/methods/search.py`, `cognee/modules/users/methods/...`	Added multi_user_support_possible() and backend_access_control_enabled() functions; initialize logging and emit startup message; replaced direct env checks with centralized backend_access_control_enabled() calls across search and authentication modules.
Public API Exports `cognee/__init__.py`, `cognee/modules/run_custom_pipeline/__init__.py`	Added run_custom_pipeline export at package level.
Retrieval System Refactoring (response_model) `cognee/modules/retrieval/base_retriever.py`, `cognee/modules/retrieval/base_graph_retriever.py`, `cognee/modules/retrieval/completion_retriever.py`, `cognee/modules/retrieval/entity_completion_retriever.py`, `cognee/modules/retrieval/graph_completion_*.py`, `cognee/modules/retrieval/temporal_retriever.py`, `cognee/modules/retrieval/utils/completion.py`	Extended get_completion() signatures across all retriever classes to accept response_model: Type = str parameter; updated return types from str/single value to List[Any]; refactored completion.py to rename generate_structured_completion → generate_completion.
Session Persistence & Memify `cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py`, `cognee/tasks/memify/cognify_session.py`, `cognee/tasks/memify/extract_user_sessions.py`, `cognee/tasks/memify/__init__.py`	Added persist_sessions_in_knowledge_graph_pipeline orchestrating session extraction, enrichment, and persistence; added cognify_session and extract_user_sessions task functions; exposed new tasks in memify init.
Edge & Chunk Model Extensions `cognee/infrastructure/engine/models/Edge.py`, `cognee/modules/chunking/models/DocumentChunk.py`, `cognee/modules/graph/utils/expand_with_nodes_and_edges.py`, `cognee/modules/graph/utils/resolve_edges_to_text.py`	Added edge_text field with auto-population validator to Edge; updated DocumentChunk.contains to support (Edge, Entity) tuples; modified expand_with_nodes_and_edges to wrap entities in Edge-Entity pairs; enhanced resolve_edges_to_text with node extraction and connection formatting helpers.
Chunking System `cognee/modules/chunking/text_chunker_with_overlap.py`	Introduced TextChunkerWithOverlap class supporting configurable overlap ratios, accumulation-driven chunking, and overlap preservation across chunk boundaries.
Graph & Search Infrastructure `cognee/modules/graph/cognee_graph.py`, `cognee/modules/retrieval/utils/brute_force_triplet_search.py`, `cognee/modules/retrieval/cypher_search_retriever.py`, `cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py`	Updated edge distance lookup to use edge_text as primary key with relationship_type fallback; extended brute_force triplet search to project edge_text; wrapped cypher results with jsonable_encoder; added is_empty() method to NeptuneAnalyticsAdapter.
Pipeline & Storage Tasks `cognee/modules/pipelines/models/DataItem.py`, `cognee/modules/pipelines/models/__init__.py`, `cognee/modules/run_custom_pipeline/run_custom_pipeline.py`, `cognee/tasks/storage/index_data_points.py`, `cognee/tasks/storage/index_graph_edges.py`	Added DataItem dataclass with optional label field and init.py re-export; implemented run_custom_pipeline orchestration function; refactored index_data_points to group by type/field with lazy index creation; refactored index_graph_edges to delegate to index_data_points via create_edge_type_datapoints helper.
Vector Database & Error Messages `cognee/infrastructure/databases/vector/create_vector_engine.py`, `cognee/modules/users/methods/get_default_user.py`	Corrected error message from "graph database provider" to "vector database provider"; updated get_default_user return type from SimpleNamespace to User.
Test Suite Enhancements `cognee/tests/test_*.py`, `cognee/tests/unit/...`	Updated existing tests to use keyword arguments, access nested search results ["search_result"][0], conditionally handle backend_access_control modes, and verify edge_text presence; added comprehensive test suites for TextChunkerWithOverlap, memify tasks (cognify_session, extract_user_sessions), index_data_points, and structured output retrieval across all retriever implementations; removed obsolete test_get_structured_completion tests.
Examples `examples/python/*.py`	Disabled backend_access_control in code_graph, relational_database_migration examples; added conversation_session_persistence_example and run_custom_pipeline_example; updated agentic_reasoning and memify_coding_agent examples to access nested search results; simplified simple_example by removing commented output documentation.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Config as backend_access_control_enabled()
    participant Env as Environment
    participant DBConfig as Graph/Vector DB Config
    participant Flow as Feature Logic

    App->>Config: Check if backend access control is enabled
    Config->>Env: Read ENABLE_BACKEND_ACCESS_CONTROL
    alt ENABLE_BACKEND_ACCESS_CONTROL set
        Config->>DBConfig: Call get_graph_context_config() & get_vectordb_context_config()
        DBConfig-->>Config: Return provider info
        Config->>Config: Validate multi-user support (kuzu, lancedb)
        alt Providers supported
            Config-->>App: Return True
        else Unsupported providers
            Config-->>App: Raise EnvironmentError
        end
    else ENABLE_BACKEND_ACCESS_CONTROL not set
        Config->>Config: Call multi_user_support_possible()
        alt Multi-user support possible
            Config-->>App: Return True
        else Not possible
            Config-->>App: Return False
        end
    end
    Flow->>App: Apply access control or skip

sequenceDiagram
    participant Client as Client
    participant Pipeline as persist_sessions_in_knowledge_graph_pipeline
    participant Context as Context Setup
    participant Extract as extract_user_sessions
    participant Cognify as cognify_session
    participant Memify as memify

    Client->>Pipeline: Call with user, session_ids, dataset
    Pipeline->>Context: set_session_user_context_variable(user)
    Pipeline->>Context: Retrieve & validate dataset write access
    Pipeline->>Context: set_database_global_context_variables(dataset.id, owner_id)
    Pipeline->>Extract: Build extraction task
    Extract-->>Pipeline: Generator yields formatted QA data
    Pipeline->>Cognify: Build enrichment task (cognify_session)
    Pipeline->>Memify: Invoke with tasks, dataset, data
    Memify-->>Pipeline: Process & return result
    Pipeline-->>Client: Return memify result

sequenceDiagram
    participant Caller as Caller
    participant Retriever as Retriever (get_completion)
    participant Completion as generate_completion()
    participant LLM as LLM Engine

    Caller->>Retriever: get_completion(query, context, session_id, response_model=MyModel)
    Retriever->>Completion: Call with response_model
    Completion->>LLM: Request generation with structured output (MyModel)
    LLM-->>Completion: Return structured response
    Completion-->>Retriever: Return response
    Retriever->>Retriever: Wrap in List[Any]
    Retriever-->>Caller: Return [response] or [] based on result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Key areas requiring careful attention:

Backend access control integration: Verify multi_user_support_possible() and backend_access_control_enabled() logic across context_global_variables.py and all call sites; ensure proper environment validation and error handling for unsupported DB combinations.
Retrieval system refactoring: Cross-check all retriever implementations (GraphCompletionRetriever, EntityCompletionRetriever, etc.) for consistent response_model propagation; verify return type changes from str/single-value to List[Any] across cached and non-cached code paths.
Session persistence pipeline: Review persist_sessions_in_knowledge_graph_pipeline for proper user context setting, write-access validation, and error propagation; validate extract_user_sessions and cognify_session task orchestration and exception handling.
Edge text functionality: Confirm edge_text auto-population in Edge model, downstream usage in distance lookups (CogneeGraph, brute_force_triplet_search), and consistent inclusion in expand_with_nodes_and_edges tuples.
Test coverage: Validate new test implementations (TextChunkerWithOverlap, memify tasks, structured output retrievers) for proper mocking and assertion logic.

Possibly related PRs

test: fix weighted edges example #1745: Directly related — also modifies ENABLE_BACKEND_ACCESS_CONTROL default in .env.template and backend_access_control/multi-user support logic in context_global_variables.
feat: optimize repeated entity extraction #1682: Directly related — both modify Edge model and introduce/use edge_text property, including auto-population, indexing, and retrieval integration.
Feat/cog 1365 unify retrievers #572: Directly related — both refactor retriever framework, modifying BaseRetriever/BaseGraphRetriever signatures and completion utility functions (renaming generate_structured_completion → generate_completion).

Suggested labels

run-checks, community-contribution, review-required

Suggested reviewers

borisarzentar
dexters1
alekszievr

Poem

🐰 A rabbit hops through structured flows,
With edge_text metadata that glows,
Sessions persist in knowledge graphs deep,
While overlapping chunks their boundaries keep,
Access control now finds its place—
Come search, retrieve, and embrace the grace! 🌿✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title claims to add a label field to DataItem, but the changeset includes extensive modifications across 70+ files affecting backend access control, edge text handling, retrieval systems, chunking, and more.	Revise the title to reflect the full scope of changes, such as 'feat: add label field to DataItem and implement backend access control with multi-user support' or split into multiple focused PRs.
Description check	⚠️ Warning	The PR description only mentions adding a label field to DataItem, but the changeset implements backend access control, edge text metadata, retrieval response modeling, chunking with overlap, and session persistence—representing a major feature set not disclosed in the description.	Update the description to comprehensively document all implemented features, their relationships, rationale, and potential breaking changes or migration requirements.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Hello @mohitk-patwari, thank you for submitting a PR! We will respond as soon as possible.

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

cognee/tests/test_edge_ingestion.py (1)
14-14: Missing pytest.mark.asyncio decorator.

The test function is async but lacks the @pytest.mark.asyncio decorator. As per coding guidelines, async tests should use this decorator to ensure proper execution by pytest.

Apply this diff:
+import pytest
+
+@pytest.mark.asyncio
 async def test_edge_ingestion():
cognee/modules/retrieval/graph_completion_retriever.py (1)
205-208: Ensure saved interactions serialize structured responses

When response_model returns a non-string object (dataclass/Pydantic), the code still forwards that object to save_qa, which concatenates it with question at Line 234. That raises TypeError: can only concatenate str (not "<type>") to str, so any caller enabling structured outputs while save_interaction=True will crash. Convert completions to strings (or another serializable form) before saving.

Apply this diff to preserve existing behavior for strings while safely handling structured outputs:
@@
-        if self.save_interaction and context and triplets and completion:
-            await self.save_qa(
-                question=query, answer=completion, context=context_text, triplets=triplets
-            )
+        if self.save_interaction and context and triplets and completion:
+            serialized_completion = (
+                completion if isinstance(completion, str) else str(completion)
+            )
+            await self.save_qa(
+                question=query,
+                answer=serialized_completion,
+                context=context_text,
+                triplets=triplets,
+            )
cognee/modules/retrieval/graph_completion_cot_retriever.py (1)
167-199: Fix docstring inconsistencies.

The docstring has two inconsistencies with the actual method signature:

Line 188: Documents context as Optional[Any], but the signature specifies Optional[List[Edge]]

Line 199: Documents return type as List[str], but the signature returns List[Any]

Apply this diff to fix the docstring:
-            - context (Optional[Any]): Optional context that may assist in answering the query.
+            - context (Optional[List[Edge]]): Optional context that may assist in answering the query.
               If not provided, it will be fetched based on the query. (default None)
             - session_id (Optional[str]): Optional session identifier for caching. If None,
               defaults to 'default_session'. (default None)
             - max_iter: The maximum number of iterations to refine the answer and generate
               follow-up questions. (default 4)
             - response_model (Type): The Pydantic model type for structured output. (default str)
 
         Returns:
         --------
 
-            - List[str]: A list containing the generated answer to the user's query.
+            - List[Any]: A list containing the generated answer to the user's query.

🧹 Nitpick comments (8)

cognee/modules/pipelines/models/DataItem.py (1)
1-4: Consider using relative import for consistency.

While the absolute import on line 3 is correct, the module's __init__.py uses relative imports (e.g., from .DataItemStatus import DataItemStatus). For consistency with the existing codebase pattern, consider using a relative import.

Apply this diff if you prefer to align with the existing import style:
-from cognee.modules.pipelines.models.DataItemStatus import DataItemStatus
+from .DataItemStatus import DataItemStatus
cognee/tests/unit/modules/pipelines/test_data_item_label.py (1)
4-12: Consider adding test coverage for the default label value.

The test correctly validates the label field when explicitly provided. To ensure comprehensive coverage of the new optional field, consider adding a test case that verifies the default behavior (label=None when not provided).

Add this test function to verify the default case:
def test_data_item_label_field_default():
    item = DataItem(
        id="124",
        name="Sample Item Without Label",
        source="mock_source",
        status=DataItemStatus.DATA_ITEM_PROCESSING_COMPLETED
    )
    assert item.label is None
cognee/modules/retrieval/cypher_search_retriever.py (1)
55-55: Remove unnecessary jsonable_encoder wrapper for query results.

Both graph backends return JSON-serializable types:

Neo4j adapter returns List[Dict[str, Any]] directly from result.data()

Kuzu adapter returns List[Tuple] with scalar values

Since the query results are already JSON-serializable, jsonable_encoder adds unnecessary overhead:
result = await graph_engine.query(query)
Remove the jsonable_encoder wrapper at line 55 unless query results contain Pydantic models or custom objects requiring special encoding.
cognee/tasks/feedback/generate_improved_answers.py (1)
72-76: Consider defensive attribute access for completion[0].

While response_model=ImprovedAnswerResponse should ensure the correct type, accessing completion[0].answer and completion[0].explanation without verification could raise AttributeError if the response_model contract is not met. Consider wrapping these accesses in a try-except block or adding a type check.

Apply this diff to add defensive handling:
         if completion:
-            enrichment.improved_answer = completion[0].answer
+            try:
+                enrichment.improved_answer = completion[0].answer
+                enrichment.explanation = completion[0].explanation
+            except (AttributeError, IndexError) as e:
+                logger.warning(
+                    "Unexpected completion format",
+                    error=str(e),
+                    question=enrichment.question,
+                )
+                return None
             enrichment.new_context = new_context_text
-            enrichment.explanation = completion[0].explanation
             return enrichment
cognee/tests/test_search_db.py (1)
150-155: Consider moving import to module level.

The import of backend_access_control_enabled is placed inside the loop. While this works, moving it to the module level (alongside other imports) would improve readability and follow standard Python conventions.

Apply this diff:
 from cognee.modules.search.types import SearchType
 from collections import Counter
+from cognee.context_global_variables import backend_access_control_enabled

 logger = get_logger()
 
 ...
 
     for name, search_results in [
         ...
     ]:
         assert isinstance(search_results, list), f"{name}: should return a list"
         assert len(search_results) == 1, (
             f"{name}: expected single-element list, got {len(search_results)}"
         )
 
-        from cognee.context_global_variables import backend_access_control_enabled
-
         if backend_access_control_enabled():
             text = search_results[0]["search_result"][0]
         else:
             text = search_results[0]
cognee/tasks/memify/cognify_session.py (1)
40-41: Improve exception handling following Python best practices.

The error handling can be improved in two ways:

Use logger.exception() instead of logger.error() to automatically include the traceback

Chain the exception with raise ... from e to preserve the exception context

Apply this diff:
-        logger.error(f"Error cognifying session data: {str(e)}")
-        raise CogneeSystemError(message=f"Failed to cognify session data: {str(e)}", log=False)
+        logger.exception("Error cognifying session data")
+        raise CogneeSystemError(message=f"Failed to cognify session data: {str(e)}", log=False) from e
cognee/modules/retrieval/EntityCompletionRetriever.py (1)

114-116: Docstring return type should match the new signature
Line 114 still documents List[str], but the method now returns List[Any] when a structured response_model is supplied. Please align the docstring with the actual return shape so downstream callers aren’t misled.

cognee/modules/retrieval/graph_completion_context_extension_retriever.py (1)

85-87: Docstring return type should reflect structured responses
Line 85 still advertises a List[str], but the signature now returns List[Any] to accommodate custom response_model types. Please update the docstring so it mirrors the broadened return type.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 487635b and 763a05d.

⛔ Files ignored due to path filters (9)

.github/workflows/examples_tests.yml is excluded by !**/*.yml
.github/workflows/load_tests.yml is excluded by !**/*.yml
.github/workflows/release_test.yml is excluded by !**/*.yml
.github/workflows/search_db_tests.yml is excluded by !**/*.yml
.github/workflows/test_different_operating_systems.yml is excluded by !**/*.yml
.github/workflows/test_suites.yml is excluded by !**/*.yml
2wikimultihop_dev.json is excluded by !**/*.json
docker-compose.yml is excluded by !**/*.yml
hotpot_benchmark.json is excluded by !**/*.json

📒 Files selected for processing (68)

.env.template (1 hunks)
cognee/__init__.py (1 hunks)
cognee/api/client.py (2 hunks)
cognee/context_global_variables.py (3 hunks)
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1 hunks)
cognee/infrastructure/databases/vector/create_vector_engine.py (1 hunks)
cognee/infrastructure/engine/models/Edge.py (2 hunks)
cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py (1 hunks)
cognee/modules/chunking/models/DocumentChunk.py (2 hunks)
cognee/modules/chunking/text_chunker_with_overlap.py (1 hunks)
cognee/modules/graph/cognee_graph/CogneeGraph.py (1 hunks)
cognee/modules/graph/utils/expand_with_nodes_and_edges.py (2 hunks)
cognee/modules/graph/utils/resolve_edges_to_text.py (1 hunks)
cognee/modules/pipelines/models/DataItem.py (1 hunks)
cognee/modules/pipelines/models/__init__.py (1 hunks)
cognee/modules/retrieval/EntityCompletionRetriever.py (5 hunks)
cognee/modules/retrieval/base_graph_retriever.py (2 hunks)
cognee/modules/retrieval/base_retriever.py (2 hunks)
cognee/modules/retrieval/completion_retriever.py (6 hunks)
cognee/modules/retrieval/cypher_search_retriever.py (2 hunks)
cognee/modules/retrieval/graph_completion_context_extension_retriever.py (5 hunks)
cognee/modules/retrieval/graph_completion_cot_retriever.py (4 hunks)
cognee/modules/retrieval/graph_completion_retriever.py (3 hunks)
cognee/modules/retrieval/temporal_retriever.py (4 hunks)
cognee/modules/retrieval/utils/brute_force_triplet_search.py (1 hunks)
cognee/modules/retrieval/utils/completion.py (2 hunks)
cognee/modules/run_custom_pipeline/__init__.py (1 hunks)
cognee/modules/run_custom_pipeline/run_custom_pipeline.py (1 hunks)
cognee/modules/search/methods/search.py (3 hunks)
cognee/modules/users/methods/get_authenticated_user.py (1 hunks)
cognee/modules/users/methods/get_default_user.py (1 hunks)
cognee/tasks/feedback/generate_improved_answers.py (2 hunks)
cognee/tasks/memify/__init__.py (1 hunks)
cognee/tasks/memify/cognify_session.py (1 hunks)
cognee/tasks/memify/extract_user_sessions.py (1 hunks)
cognee/tasks/storage/index_data_points.py (1 hunks)
cognee/tasks/storage/index_graph_edges.py (3 hunks)
cognee/tests/test_add_docling_document.py (1 hunks)
cognee/tests/test_conversation_history.py (3 hunks)
cognee/tests/test_edge_ingestion.py (1 hunks)
cognee/tests/test_feedback_enrichment.py (1 hunks)
cognee/tests/test_library.py (1 hunks)
cognee/tests/test_load.py (1 hunks)
cognee/tests/test_relational_db_migration.py (1 hunks)
cognee/tests/test_search_db.py (1 hunks)
cognee/tests/unit/api/test_conditional_authentication_endpoints.py (6 hunks)
cognee/tests/unit/infrastructure/databases/test_index_data_points.py (1 hunks)
cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py (2 hunks)
cognee/tests/unit/modules/chunking/test_text_chunker.py (1 hunks)
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py (1 hunks)
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py (1 hunks)
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py (1 hunks)
cognee/tests/unit/modules/pipelines/test_data_item_label.py (1 hunks)
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py (0 hunks)
cognee/tests/unit/modules/retrieval/rag_completion_retriever_test.py (1 hunks)
cognee/tests/unit/modules/retrieval/structured_output_test.py (1 hunks)
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py (1 hunks)
cognee/tests/unit/modules/retrieval/temporal_retriever_test.py (0 hunks)
cognee/tests/unit/modules/users/test_conditional_authentication.py (0 hunks)
entrypoint.sh (1 hunks)
examples/python/agentic_reasoning_procurement_example.py (1 hunks)
examples/python/code_graph_example.py (2 hunks)
examples/python/conversation_session_persistence_example.py (1 hunks)
examples/python/feedback_enrichment_minimal_example.py (0 hunks)
examples/python/memify_coding_agent_example.py (1 hunks)
examples/python/relational_database_migration_example.py (1 hunks)
examples/python/run_custom_pipeline_example.py (1 hunks)
examples/python/simple_example.py (0 hunks)

💤 Files with no reviewable changes (5)

examples/python/feedback_enrichment_minimal_example.py
cognee/tests/unit/modules/retrieval/temporal_retriever_test.py
examples/python/simple_example.py
cognee/tests/unit/modules/users/test_conditional_authentication.py
cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py

🧰 Additional context used

📓 Path-based instructions (5)

{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py: Use 4-space indentation; name modules and functions in snake_case; name classes in PascalCase (Python)
Adhere to ruff rules, including import hygiene and configured line length (100)
Keep Python lines ≤ 100 characters

Files:

examples/python/code_graph_example.py
cognee/tests/unit/modules/retrieval/summaries_retriever_test.py
cognee/modules/run_custom_pipeline/run_custom_pipeline.py
cognee/modules/graph/cognee_graph/CogneeGraph.py
cognee/tests/test_library.py
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py
cognee/modules/pipelines/models/DataItem.py
cognee/tests/test_add_docling_document.py
cognee/modules/search/methods/search.py
cognee/context_global_variables.py
cognee/tasks/feedback/generate_improved_answers.py
cognee/tests/unit/modules/retrieval/rag_completion_retriever_test.py
cognee/tests/test_search_db.py
cognee/tasks/storage/index_graph_edges.py
cognee/tasks/storage/index_data_points.py
cognee/tests/unit/modules/chunking/test_text_chunker.py
cognee/modules/run_custom_pipeline/__init__.py
cognee/tests/test_relational_db_migration.py
cognee/modules/pipelines/models/__init__.py
cognee/modules/retrieval/graph_completion_retriever.py
cognee/tests/unit/infrastructure/databases/test_index_data_points.py
cognee/tasks/memify/extract_user_sessions.py
examples/python/run_custom_pipeline_example.py
cognee/api/client.py
cognee/modules/retrieval/completion_retriever.py
cognee/modules/users/methods/get_default_user.py
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py
cognee/tasks/memify/cognify_session.py
cognee/modules/chunking/models/DocumentChunk.py
cognee/__init__.py
examples/python/conversation_session_persistence_example.py
cognee/infrastructure/engine/models/Edge.py
cognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/tests/test_feedback_enrichment.py
cognee/modules/retrieval/graph_completion_context_extension_retriever.py
examples/python/memify_coding_agent_example.py
cognee/modules/graph/utils/expand_with_nodes_and_edges.py
cognee/infrastructure/databases/vector/create_vector_engine.py
cognee/tasks/memify/__init__.py
cognee/modules/retrieval/base_graph_retriever.py
cognee/modules/chunking/text_chunker_with_overlap.py
cognee/modules/retrieval/temporal_retriever.py
cognee/tests/test_load.py
cognee/modules/retrieval/base_retriever.py
cognee/modules/retrieval/cypher_search_retriever.py
cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
examples/python/agentic_reasoning_procurement_example.py
cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py
cognee/tests/test_conversation_history.py
cognee/modules/graph/utils/resolve_edges_to_text.py
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py
cognee/tests/test_edge_ingestion.py
examples/python/relational_database_migration_example.py
cognee/modules/retrieval/utils/completion.py
cognee/modules/users/methods/get_authenticated_user.py
cognee/modules/retrieval/EntityCompletionRetriever.py
cognee/tests/unit/modules/retrieval/structured_output_test.py
cognee/tests/unit/modules/pipelines/test_data_item_label.py
cognee/modules/retrieval/graph_completion_cot_retriever.py

examples/python/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

When adding public APIs, provide or update targeted examples under examples/python/

Files:

examples/python/code_graph_example.py
examples/python/run_custom_pipeline_example.py
examples/python/conversation_session_persistence_example.py
examples/python/memify_coding_agent_example.py
examples/python/agentic_reasoning_procurement_example.py
examples/python/relational_database_migration_example.py

cognee/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/**/*.py: Public APIs in the core library should be type-annotated where practical
Prefer explicit, structured error handling and use shared logging utilities from cognee.shared.logging_utils

Files:

cognee/tests/unit/modules/retrieval/summaries_retriever_test.py
cognee/modules/run_custom_pipeline/run_custom_pipeline.py
cognee/modules/graph/cognee_graph/CogneeGraph.py
cognee/tests/test_library.py
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py
cognee/modules/pipelines/models/DataItem.py
cognee/tests/test_add_docling_document.py
cognee/modules/search/methods/search.py
cognee/context_global_variables.py
cognee/tasks/feedback/generate_improved_answers.py
cognee/tests/unit/modules/retrieval/rag_completion_retriever_test.py
cognee/tests/test_search_db.py
cognee/tasks/storage/index_graph_edges.py
cognee/tasks/storage/index_data_points.py
cognee/tests/unit/modules/chunking/test_text_chunker.py
cognee/modules/run_custom_pipeline/__init__.py
cognee/tests/test_relational_db_migration.py
cognee/modules/pipelines/models/__init__.py
cognee/modules/retrieval/graph_completion_retriever.py
cognee/tests/unit/infrastructure/databases/test_index_data_points.py
cognee/tasks/memify/extract_user_sessions.py
cognee/api/client.py
cognee/modules/retrieval/completion_retriever.py
cognee/modules/users/methods/get_default_user.py
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py
cognee/tasks/memify/cognify_session.py
cognee/modules/chunking/models/DocumentChunk.py
cognee/__init__.py
cognee/infrastructure/engine/models/Edge.py
cognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/tests/test_feedback_enrichment.py
cognee/modules/retrieval/graph_completion_context_extension_retriever.py
cognee/modules/graph/utils/expand_with_nodes_and_edges.py
cognee/infrastructure/databases/vector/create_vector_engine.py
cognee/tasks/memify/__init__.py
cognee/modules/retrieval/base_graph_retriever.py
cognee/modules/chunking/text_chunker_with_overlap.py
cognee/modules/retrieval/temporal_retriever.py
cognee/tests/test_load.py
cognee/modules/retrieval/base_retriever.py
cognee/modules/retrieval/cypher_search_retriever.py
cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py
cognee/tests/test_conversation_history.py
cognee/modules/graph/utils/resolve_edges_to_text.py
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py
cognee/tests/test_edge_ingestion.py
cognee/modules/retrieval/utils/completion.py
cognee/modules/users/methods/get_authenticated_user.py
cognee/modules/retrieval/EntityCompletionRetriever.py
cognee/tests/unit/modules/retrieval/structured_output_test.py
cognee/tests/unit/modules/pipelines/test_data_item_label.py
cognee/modules/retrieval/graph_completion_cot_retriever.py

cognee/tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Place unit tests under cognee/tests/unit/

Files:

cognee/tests/unit/modules/retrieval/summaries_retriever_test.py
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py
cognee/tests/unit/modules/retrieval/rag_completion_retriever_test.py
cognee/tests/unit/modules/chunking/test_text_chunker.py
cognee/tests/unit/infrastructure/databases/test_index_data_points.py
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py
cognee/tests/unit/modules/retrieval/structured_output_test.py
cognee/tests/unit/modules/pipelines/test_data_item_label.py

cognee/tests/**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

cognee/tests/**/test_*.py: Name test files as test_*.py
Use pytest.mark.asyncio for async tests
Tests should avoid external state; rely on fixtures and CI-provided env vars when providers are required

Files:

cognee/tests/test_library.py
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py
cognee/tests/test_add_docling_document.py
cognee/tests/test_search_db.py
cognee/tests/unit/modules/chunking/test_text_chunker.py
cognee/tests/test_relational_db_migration.py
cognee/tests/unit/infrastructure/databases/test_index_data_points.py
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py
cognee/tests/test_feedback_enrichment.py
cognee/tests/test_load.py
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py
cognee/tests/test_conversation_history.py
cognee/tests/test_edge_ingestion.py
cognee/tests/unit/modules/pipelines/test_data_item_label.py

🧠 Learnings (8)

📚 Learning: 2024-11-13T16:17:17.646Z

Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraphElements.py:82-90
Timestamp: 2024-11-13T16:17:17.646Z
Learning: In `cognee/modules/graph/cognee_graph/CogneeGraphElements.py`, within the `Edge` class, nodes and edges can have different dimensions, and it's acceptable for them not to match.

Applied to files:

cognee/modules/graph/cognee_graph/CogneeGraph.py
cognee/modules/chunking/models/DocumentChunk.py
cognee/modules/retrieval/graph_completion_context_extension_retriever.py
cognee/modules/graph/utils/expand_with_nodes_and_edges.py
cognee/modules/retrieval/base_graph_retriever.py

📚 Learning: 2024-11-13T16:06:32.576Z

Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraph.py:32-38
Timestamp: 2024-11-13T16:06:32.576Z
Learning: In `CogneeGraph.py`, within the `CogneeGraph` class, it's intentional to add skeleton edges in both the `add_edge` method and the `project_graph_from_db` method to ensure that edges are added to the graph and to the nodes.

Applied to files:

cognee/modules/graph/cognee_graph/CogneeGraph.py
cognee/modules/retrieval/utils/brute_force_triplet_search.py
cognee/modules/graph/utils/expand_with_nodes_and_edges.py

📚 Learning: 2024-11-13T14:55:05.912Z

Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.

Applied to files:

cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py
cognee/tests/unit/modules/chunking/test_text_chunker.py
cognee/modules/chunking/models/DocumentChunk.py
cognee/modules/chunking/text_chunker_with_overlap.py

📚 Learning: 2024-12-04T18:37:55.092Z

Learnt from: hajdul88
Repo: topoteretes/cognee PR: 251
File: cognee/tests/infrastructure/databases/test_index_graph_edges.py:0-0
Timestamp: 2024-12-04T18:37:55.092Z
Learning: In the `index_graph_edges` function, both graph engine and vector engine initialization failures are handled within the same try-except block, so a single test covers both cases.

Applied to files:

cognee/tasks/storage/index_graph_edges.py
cognee/tests/unit/infrastructure/databases/test_index_data_points.py
cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py

📚 Learning: 2025-10-11T04:18:24.594Z

Learnt from: Vattikuti-Manideep-Sitaram
Repo: topoteretes/cognee PR: 1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.594Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.

Applied to files:

examples/python/run_custom_pipeline_example.py

📚 Learning: 2025-10-27T09:21:14.154Z

Learnt from: CR
Repo: topoteretes/cognee PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-27T09:21:14.154Z
Learning: Applies to cognee/tests/unit/**/*.py : Place unit tests under cognee/tests/unit/

Applied to files:

cognee/tests/unit/modules/memify_tasks/test_cognify_session.py

📚 Learning: 2025-10-27T09:21:14.154Z

Learnt from: CR
Repo: topoteretes/cognee PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-27T09:21:14.154Z
Learning: Applies to cognee/tests/**/test_*.py : Use pytest.mark.asyncio for async tests

Applied to files:

cognee/tests/unit/modules/memify_tasks/test_cognify_session.py
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py

📚 Learning: 2025-10-27T09:21:14.154Z

Learnt from: CR
Repo: topoteretes/cognee PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-27T09:21:14.154Z
Learning: Applies to cognee/tests/**/test_*.py : Tests should avoid external state; rely on fixtures and CI-provided env vars when providers are required

Applied to files:

cognee/tests/unit/api/test_conditional_authentication_endpoints.py

🧬 Code graph analysis (38)

cognee/modules/run_custom_pipeline/run_custom_pipeline.py (3)

cognee/shared/logging_utils.py (1)

get_logger (212-224)

cognee/modules/pipelines/tasks/task.py (1)

Task (5-97)

cognee/modules/pipelines/layers/pipeline_execution_mode.py (1)

get_pipeline_executor (117-127)

cognee/tests/test_library.py (1)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-19)

cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py (2)

cognee/modules/chunking/text_chunker_with_overlap.py (2)

TextChunkerWithOverlap (11-124)

read (112-124)

cognee/tasks/chunks/chunk_by_paragraph.py (1)

chunk_by_paragraph (7-96)

cognee/modules/pipelines/models/DataItem.py (1)

cognee/modules/pipelines/models/DataItemStatus.py (1)

DataItemStatus (4-5)

cognee/modules/search/methods/search.py (1)

cognee/context_global_variables.py (1)

backend_access_control_enabled (36-50)

cognee/context_global_variables.py (2)

cognee/infrastructure/databases/vector/config.py (1)

get_vectordb_context_config (84-90)

cognee/infrastructure/databases/graph/config.py (1)

get_graph_context_config (140-148)

cognee/tasks/feedback/generate_improved_answers.py (2)

cognee/modules/retrieval/EntityCompletionRetriever.py (1)

get_completion (87-165)

cognee/modules/retrieval/completion_retriever.py (1)

get_completion (77-147)

cognee/tests/test_search_db.py (1)

cognee/context_global_variables.py (1)

backend_access_control_enabled (36-50)

cognee/tasks/storage/index_graph_edges.py (5)

cognee/modules/engine/utils/generate_edge_id.py (1)

generate_edge_id (4-5)

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

get_graph_engine (10-24)

cognee/tasks/storage/index_data_points.py (1)

index_data_points (10-65)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)

index_data_points (251-263)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)

index_data_points (297-319)

cognee/tasks/storage/index_data_points.py (6)

cognee/infrastructure/databases/vector/get_vector_engine.py (1)

get_vector_engine (5-7)

cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)

create_vector_index (292-295)

index_data_points (297-309)

cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)

create_vector_index (248-249)

index_data_points (251-263)

cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)

create_vector_index (285-295)

index_data_points (297-319)

cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)

get_batch_size (140-147)

cognee/infrastructure/databases/vector/embeddings/EmbeddingEngine.py (1)

get_batch_size (38-45)

cognee/tests/unit/modules/chunking/test_text_chunker.py (3)

cognee/modules/chunking/TextChunker.py (1)

TextChunker (11-78)

cognee/modules/chunking/text_chunker_with_overlap.py (2)

TextChunkerWithOverlap (11-124)

read (112-124)

cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py (4)

make_text_generator (14-24)

_factory (17-22)

_factory (31-43)

gen (18-20)

cognee/modules/run_custom_pipeline/__init__.py (1)

cognee/modules/run_custom_pipeline/run_custom_pipeline.py (1)

run_custom_pipeline (14-69)

cognee/modules/pipelines/models/__init__.py (1)

cognee/modules/pipelines/models/DataItem.py (1)

DataItem (6-11)

cognee/tests/unit/infrastructure/databases/test_index_data_points.py (2)

cognee/tasks/storage/index_data_points.py (1)

index_data_points (10-65)

cognee/infrastructure/engine/models/DataPoint.py (1)

DataPoint (20-220)

cognee/tasks/memify/extract_user_sessions.py (3)

cognee/exceptions/exceptions.py (1)

CogneeSystemError (38-49)

cognee/infrastructure/databases/cache/get_cache_engine.py (1)

get_cache_engine (54-67)

cognee/shared/logging_utils.py (2)

get_logger (212-224)

info (205-205)

examples/python/run_custom_pipeline_example.py (8)

cognee/modules/users/methods/get_default_user.py (1)

get_default_user (13-36)

cognee/shared/logging_utils.py (1)

setup_logging (288-555)

cognee/modules/pipelines/tasks/task.py (1)

Task (5-97)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-19)

cognee/tasks/ingestion/ingest_data.py (1)

ingest_data (25-199)

cognee/tasks/ingestion/resolve_data_directories.py (1)

resolve_data_directories (10-84)

cognee/modules/run_custom_pipeline/run_custom_pipeline.py (1)

run_custom_pipeline (14-69)

cognee/api/v1/cognify/cognify.py (1)

get_default_tasks (246-297)

cognee/api/client.py (1)

cognee/shared/logging_utils.py (2)

setup_logging (288-555)

info (205-205)

cognee/modules/users/methods/get_default_user.py (1)

cognee/modules/users/models/User.py (1)

User (13-40)

cognee/tests/unit/modules/memify_tasks/test_cognify_session.py (2)

cognee/tasks/memify/cognify_session.py (1)

cognify_session (9-41)

cognee/exceptions/exceptions.py (2)

CogneeValidationError (52-63)

CogneeSystemError (38-49)

cognee/tasks/memify/cognify_session.py (2)

cognee/exceptions/exceptions.py (2)

CogneeValidationError (52-63)

CogneeSystemError (38-49)

cognee/shared/logging_utils.py (3)

get_logger (212-224)

info (205-205)

debug (209-209)

cognee/modules/chunking/models/DocumentChunk.py (1)

cognee/infrastructure/engine/models/Edge.py (1)

Edge (5-38)

cognee/__init__.py (1)

cognee/modules/run_custom_pipeline/run_custom_pipeline.py (1)

run_custom_pipeline (14-69)

examples/python/conversation_session_persistence_example.py (5)

cognee/api/v1/visualize/visualize.py (1)

visualize_graph (14-27)

cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py (1)

persist_sessions_in_knowledge_graph_pipeline (19-55)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-19)

cognee/modules/users/methods/get_default_user.py (1)

get_default_user (13-36)

cognee/shared/logging_utils.py (1)

get_logger (212-224)

cognee/modules/graph/utils/expand_with_nodes_and_edges.py (1)

cognee/infrastructure/engine/models/Edge.py (1)

Edge (5-38)

cognee/tasks/memify/__init__.py (2)

cognee/tasks/memify/cognify_session.py (1)

cognify_session (9-41)

cognee/tasks/memify/extract_user_sessions.py (1)

extract_user_sessions (12-73)

cognee/modules/chunking/text_chunker_with_overlap.py (4)

cognee/shared/logging_utils.py (1)

get_logger (212-224)

cognee/tasks/chunks/chunk_by_paragraph.py (1)

chunk_by_paragraph (7-96)

cognee/modules/chunking/Chunker.py (1)

Chunker (1-12)

cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py (1)

get_chunk_data (272-275)

cognee/tests/test_load.py (3)

cognee/modules/search/types/SearchType.py (1)

SearchType (4-19)

cognee/shared/logging_utils.py (1)

get_logger (212-224)

cognee/api/v1/config/config.py (2)

data_root_directory (36-38)

system_root_directory (18-33)

cognee/modules/retrieval/cypher_search_retriever.py (2)

cognee/infrastructure/databases/graph/kuzu/adapter.py (1)

query (210-278)

cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)

query (100-128)

cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py (7)

cognee/context_global_variables.py (2)

set_database_global_context_variables (53-113)

set_session_user_context_variable (23-24)

cognee/exceptions/exceptions.py (1)

CogneeValidationError (52-63)

cognee/modules/data/methods/get_authorized_existing_datasets.py (1)

get_authorized_existing_datasets (11-39)

cognee/shared/logging_utils.py (2)

get_logger (212-224)

info (205-205)

cognee/modules/pipelines/tasks/task.py (1)

Task (5-97)

cognee/tasks/memify/extract_user_sessions.py (1)

extract_user_sessions (12-73)

cognee/tasks/memify/cognify_session.py (1)

cognify_session (9-41)

cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py (2)

cognee/tasks/memify/extract_user_sessions.py (1)

extract_user_sessions (12-73)

cognee/exceptions/exceptions.py (1)

CogneeSystemError (38-49)

cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py (1)

cognee/tasks/storage/index_graph_edges.py (1)

index_graph_edges (42-77)

cognee/tests/test_conversation_history.py (3)

cognee/infrastructure/databases/graph/get_graph_engine.py (1)

get_graph_engine (10-24)

cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py (1)

persist_sessions_in_knowledge_graph_pipeline (19-55)

cognee/infrastructure/databases/vector/get_vector_engine.py (1)

get_vector_engine (5-7)

cognee/modules/graph/utils/resolve_edges_to_text.py (2)

cognee/infrastructure/engine/models/Edge.py (1)

Edge (5-38)

cognee/modules/retrieval/graph_completion_retriever.py (1)

resolve_edges_to_text (60-74)

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)

cognee/infrastructure/databases/graph/neptune_driver/adapter.py (1)

query (165-197)

cognee/modules/users/methods/get_authenticated_user.py (2)

cognee/context_global_variables.py (1)

backend_access_control_enabled (36-50)

cognee/shared/logging_utils.py (1)

get_logger (212-224)

cognee/tests/unit/modules/retrieval/structured_output_test.py (9)

cognee/infrastructure/engine/models/DataPoint.py (1)

DataPoint (20-220)

cognee/modules/data/processing/document_types/TextDocument.py (1)

TextDocument (6-22)

cognee/modules/retrieval/graph_completion_cot_retriever.py (1)

GraphCompletionCotRetriever (39-235)

cognee/modules/retrieval/graph_completion_retriever.py (1)

GraphCompletionRetriever (28-284)

cognee/modules/retrieval/graph_completion_context_extension_retriever.py (1)

GraphCompletionContextExtensionRetriever (17-174)

cognee/modules/retrieval/EntityCompletionRetriever.py (1)

EntityCompletionRetriever (20-165)

cognee/modules/retrieval/temporal_retriever.py (1)

TemporalRetriever (26-214)

cognee/modules/retrieval/completion_retriever.py (1)

CompletionRetriever (20-147)

cognee/api/v1/config/config.py (2)

system_root_directory (18-33)

data_root_directory (36-38)

cognee/tests/unit/modules/pipelines/test_data_item_label.py (2)

cognee/modules/pipelines/models/DataItem.py (1)

DataItem (6-11)

cognee/modules/pipelines/models/DataItemStatus.py (1)

DataItemStatus (4-5)

cognee/modules/retrieval/graph_completion_cot_retriever.py (3)

cognee/modules/retrieval/utils/completion.py (1)

generate_completion (6-28)

cognee/modules/retrieval/completion_retriever.py (1)

get_completion (77-147)

cognee/modules/retrieval/graph_completion_retriever.py (1)

get_completion (144-218)

🪛 Pylint (4.0.2)

cognee/modules/run_custom_pipeline/run_custom_pipeline.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 14-14: Too many arguments (9/5)

(R0913)

[refactor] 14-14: Too many positional arguments (9/5)

(R0917)

cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 239-239: Too many local variables (18/15)

(R0914)

[refactor] 243-243: Too few public methods (0/2)

(R0903)

cognee/modules/pipelines/models/DataItem.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/search/methods/search.py

[refactor] 159-202: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it

(R1705)

cognee/context_global_variables.py

[refactor] 38-49: Unnecessary "elif" after "return", remove the leading "el" from "elif"

(R1705)

cognee/tasks/storage/index_graph_edges.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tests/unit/modules/chunking/test_text_chunker.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 210-210: Too many local variables (17/15)

(R0914)

cognee/tests/unit/infrastructure/databases/test_index_data_points.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 7-7: Too few public methods (0/2)

(R0903)

cognee/tasks/memify/extract_user_sessions.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

examples/python/run_custom_pipeline_example.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tests/unit/modules/memify_tasks/test_cognify_session.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tasks/memify/cognify_session.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

examples/python/conversation_session_persistence_example.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/infrastructure/engine/models/Edge.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/retrieval/base_graph_retriever.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/chunking/text_chunker_with_overlap.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 12-12: Too many arguments (6/5)

(R0913)

[refactor] 12-12: Too many positional arguments (6/5)

(R0917)

[error] 72-72: Instance of 'TextChunkerWithOverlap' has no 'chunk_index' member

(E1101)

[error] 76-76: Instance of 'TextChunkerWithOverlap' has no 'chunk_index' member

(E1101)

[error] 109-109: Instance of 'TextChunkerWithOverlap' has no 'chunk_index' member

(E1101)

[refactor] 11-11: Too few public methods (1/2)

(R0903)

cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/tests/unit/api/test_conditional_authentication_endpoints.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/graph/utils/resolve_edges_to_text.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/retrieval/utils/completion.py

[refactor] 6-6: Too many arguments (7/5)

(R0913)

[refactor] 6-6: Too many positional arguments (7/5)

(R0917)

cognee/tests/unit/modules/retrieval/structured_output_test.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

[refactor] 144-144: Too few public methods (0/2)

(R0903)

[refactor] 147-147: Too few public methods (0/2)

(R0903)

[refactor] 128-128: Too few public methods (1/2)

(R0903)

cognee/tests/unit/modules/pipelines/test_data_item_label.py

[error] 1-1: Unrecognized option found: suggestion-mode

(E0015)

cognee/modules/retrieval/graph_completion_cot_retriever.py

[refactor] 167-167: Too many arguments (6/5)

(R0913)

[refactor] 167-167: Too many positional arguments (6/5)

(R0917)

🪛 Ruff (0.14.4)

cognee/modules/run_custom_pipeline/run_custom_pipeline.py

15-15: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py

32-32: Unused function argument: text

(ARG001)

cognee/context_global_variables.py

46-48: Avoid specifying long messages outside the exception class

(TRY003)

cognee/tests/unit/modules/chunking/test_text_chunker.py

163-163: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)

cognee/tests/unit/infrastructure/databases/test_index_data_points.py

9-9: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

cognee/tasks/memify/extract_user_sessions.py

38-38: Abstract raise to an inner function

(TRY301)

44-47: Abstract raise to an inner function

(TRY301)

61-61: Do not catch blind exception: Exception

(BLE001)

62-62: Use explicit conversion flag

Replace with conversion flag

(RUF010)

71-71: Do not catch blind exception: Exception

(BLE001)

72-72: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

72-72: Use explicit conversion flag

Replace with conversion flag

(RUF010)

73-73: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

73-73: Use explicit conversion flag

Replace with conversion flag

(RUF010)

cognee/tasks/memify/cognify_session.py

28-28: Abstract raise to an inner function

(TRY301)

39-39: Do not catch blind exception: Exception

(BLE001)

40-40: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

40-40: Use explicit conversion flag

Replace with conversion flag

(RUF010)

41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

41-41: Use explicit conversion flag

Replace with conversion flag

(RUF010)

cognee/modules/chunking/text_chunker_with_overlap.py

18-18: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

72-72: Use explicit conversion flag

Replace with conversion flag

(RUF010)

82-82: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

83-83: Use raise without specifying exception name

Remove exception name

(TRY201)

cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py

32-32: Use explicit conversion flag

Replace with conversion flag

(RUF010)

cognee/tests/unit/api/test_conditional_authentication_endpoints.py

244-244: Unused method argument: client

(ARG002)

cognee/modules/graph/utils/resolve_edges_to_text.py

10-10: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

coderabbitai · 2025-11-11T19:15:28Z

cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py

+    async def is_empty(self) -> bool:
+        query = """
+        MATCH (n)
+        RETURN true
+        LIMIT 1;
+        """
+        query_result = await self._client.query(query)
+        return len(query_result) == 0


⚠️ Potential issue | 🔴 Critical

Critical: Remove await from synchronous method call.

Line 425 attempts to await a synchronous method (self._client.query()), which will raise a TypeError at runtime. All other calls to self._client.query() in this file (lines 202, 227, 369, 416) are synchronous and do not use await.

Apply this diff to fix the issue:

- query_result = await self._client.query(query) + query_result = self._client.query(query)

Optional: Consider optimizing the query.

The current query could be simplified for better readability:

async def is_empty(self) -> bool: query = "MATCH (n) RETURN count(n) AS node_count" query_result = self._client.query(query) return query_result[0].get("node_count", 0) == 0

🤖 Prompt for AI Agents

In cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py around lines 419 to 426, remove the incorrect await on the synchronous self._client.query() call (it raises TypeError at runtime); call self._client.query(query) synchronously and return the boolean based on its result length, and optionally replace the MATCH...RETURN true LIMIT 1 query with a count-based query ("MATCH (n) RETURN count(n) AS node_count") and check node_count == 0 to determine emptiness for clearer semantics and slightly better performance.

coderabbitai · 2025-11-11T19:15:28Z

cognee/modules/chunking/text_chunker_with_overlap.py

+        get_text: callable,
+        max_chunk_size: int,
+        chunk_overlap_ratio: float = 0.0,
+        get_chunk_data: callable = None,


⚠️ Potential issue | 🟡 Minor

Add explicit Optional type annotation.

The get_chunk_data parameter should have an explicit Optional type annotation instead of implicit None default, per PEP 484 and ruff rules.

Apply this diff:

document, get_text: callable, max_chunk_size: int, chunk_overlap_ratio: float = 0.0, - get_chunk_data: callable = None, + get_chunk_data: Optional[callable] = None, ):

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.14.4)

18-18: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents

In cognee/modules/chunking/text_chunker_with_overlap.py around line 18, the parameter get_chunk_data is currently set to a default of None without an explicit Optional type; update its annotation to use typing.Optional (e.g., get_chunk_data: Optional[Callable[..., Any]] = None or Optional[Callable] = None) and ensure Optional, Callable and Any are imported from typing at the top of the file (add missing imports if necessary).

coderabbitai · 2025-11-11T19:15:28Z

cognee/modules/chunking/text_chunker_with_overlap.py

+
+    def _create_chunk_from_accumulation(self):
+        """Create a DocumentChunk from current accumulated chunk_data."""
+        chunk_text = " ".join(chunk["text"] for chunk in self._accumulated_chunk_data)


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify text joining behavior.

Joining accumulated chunks with a single space " ".join(...) could lead to double spaces if chunks already have trailing or leading whitespace. Consider whether chunks should be stripped or joined differently based on the expected text reconstruction behavior.

Based on learnings, changes to chunking implementation may require updating test ground truth values. Run this script to check existing test expectations:

🏁 Script executed:

#!/bin/bash # Search for tests that verify chunk text output to understand expected spacing behavior rg -n -A 5 "chunk.*text" cognee/tests/unit/modules/chunking/ --type py | grep -E "(assert|expected|\.text)"

Length of output: 7365

I need to examine the implementation to understand how chunks are accumulated and whether stripping is necessary at line 87.

🏁 Script executed:

#!/bin/bash # View the text_chunker_with_overlap.py implementation around line 87 cat cognee/modules/chunking/text_chunker_with_overlap.py | sed -n '60,100p'

Length of output: 1592

🏁 Script executed:

#!/bin/bash # View more context to understand how _accumulated_chunk_data is populated cat cognee/modules/chunking/text_chunker_with_overlap.py | head -150 | tail -100

Length of output: 3897

🏁 Script executed:

#!/bin/bash # Search for where chunks are added to _accumulated_chunk_data rg -n "_accumulated_chunk_data" cognee/modules/chunking/text_chunker_with_overlap.py -B 3 -A 3

Length of output: 2385

🏁 Script executed:

#!/bin/bash # Check if there's stripping being done elsewhere in the chunk text before accumulation rg -n "\.strip" cognee/modules/chunking/text_chunker_with_overlap.py

Length of output: 0

🏁 Script executed:

#!/bin/bash # Look at actual tests to understand how chunk text looks like in practice cat cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py | sed -n '70,90p'

Length of output: 829

🏁 Script executed:

#!/bin/bash # Check what the chunk dictionary contains - look for chunk["text"] pattern rg -n 'chunk\["text"\]' cognee/modules/chunking/ -B 2 -A 2

Length of output: 1519

🏁 Script executed:

#!/bin/bash # Find where the chunk dictionary gets created rg -n '"text"' cognee/modules/chunking/text_chunker_with_overlap.py -B 2 -A 2

Length of output: 730

🏁 Script executed:

#!/bin/bash # Inspect the accumulate_chunk_data method rg -n "def.*accum" cognee/modules/chunking/text_chunker_with_overlap.py -A 20

Length of output: 2323

🏁 Script executed:

#!/bin/bash # Look for the place where text is extracted for chunk accumulation rg -n "def process" cognee/modules/chunking/text_chunker_with_overlap.py -A 50 | head -60

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check actual chunk content by looking at a simple test case cat cognee/tests/unit/modules/chunking/test_text_chunker.py | sed -n '60,100p'

Length of output: 1606

🏁 Script executed:

#!/bin/bash # Search for where chunk text is actually set when chunks are created rg -n "chunk_text\s*=" cognee/modules/chunking/text_chunker_with_overlap.py -B 2 -A 2

Length of output: 286

🏁 Script executed:

#!/bin/bash # Look at all chunk creation in the file rg -n "chunk_text" cognee/modules/chunking/text_chunker_with_overlap.py

Length of output: 121

🏁 Script executed:

#!/bin/bash # Full context of _create_chunk_from_accumulation and surrounding methods cat cognee/modules/chunking/text_chunker_with_overlap.py | sed -n '50,120p'

Length of output: 2664

🏁 Script executed:

#!/bin/bash # Look at the entire file to understand structure wc -l cognee/modules/chunking/text_chunker_with_overlap.py

Length of output: 56

🏁 Script executed:

#!/bin/bash # Get the full file cat cognee/modules/chunking/text_chunker_with_overlap.py

Length of output: 4663

🏁 Script executed:

#!/bin/bash # Search for what gets passed to _accumulated_chunk_data.append rg -n "_accumulated_chunk_data\.append" cognee/modules/chunking/text_chunker_with_overlap.py -B 5 -A 2

Length of output: 405

🏁 Script executed:

#!/bin/bash # Search for strip calls more broadly in the chunking logic rg -n "strip" cognee/modules/chunking/ -type py

Length of output: 27

🏁 Script executed:

#!/bin/bash # Look at split_text or similar methods to understand what gets accumulated rg -n "def split_text|def process|def chunk" cognee/modules/chunking/text_chunker_with_overlap.py -A 30 | head -80

Length of output: 0

🏁 Script executed:

#!/bin/bash # Find where text is being processed before being added rg -n "for.*in.*\.split\(" cognee/modules/chunking/text_chunker_with_overlap.py -B 2 -A 5

Length of output: 0

🏁 Script executed:

#!/bin/bash # Simpler approach - see what _accumulated_chunk_data items look like rg -n "_accumulated_chunk_data" cognee/modules/chunking/text_chunker_with_overlap.py

Length of output: 550

🏁 Script executed:

#!/bin/bash # Get original review comment file to see full context cat cognee/modules/chunking/text_chunker_with_overlap.py | sed -n '75,95p'

Length of output: 784

🏁 Script executed:

#!/bin/bash # Find chunk_by_paragraph implementation fd -name "*.py" -path "*/cognee/tasks/chunks*" | head -20

Length of output: 230

🏁 Script executed:

#!/bin/bash # Search for chunk_by_paragraph function rg -n "def chunk_by_paragraph" cognee/

Length of output: 67

🏁 Script executed:

#!/bin/bash # Look for where chunk_by_paragraph is defined and what it returns rg -n "chunk_by_paragraph" cognee/ -B 2 -A 10 | head -60

Length of output: 5369

🏁 Script executed:

#!/bin/bash # Check what the chunk dictionary structure is - look for batch_paragraphs usage rg -n "batch_paragraphs" cognee/ -B 3 -A 10 | head -80

Length of output: 7890

🏁 Script executed:

#!/bin/bash # Look at TextChunker.py to see the pattern there (similar joining at line 40) cat cognee/modules/chunking/TextChunker.py | head -80

Length of output: 3424

🏁 Script executed:

#!/bin/bash # Check if there's any text processing or stripping in chunk_by_paragraph or related functions rg -n "chunk_by_paragraph\|def.*chunk.*paragraph" cognee/tasks/chunks/ -A 15 | head -100

Length of output: 0

🏁 Script executed:

#!/bin/bash # View chunk_by_paragraph implementation cat cognee/tasks/chunks/chunk_by_paragraph.py

Length of output: 3414

🏁 Script executed:

#!/bin/bash # Also check if there are any recent changes or comments about spacing rg -n "batch_paragraphs|spacing|double.*space" cognee/tasks/chunks/chunk_by_paragraph.py -B 2 -A 3

Length of output: 1137

🏁 Script executed:

#!/bin/bash # Look for text reconstruction tests or exact text matching in TextChunkerWithOverlap tests rg -n "reconstructed|all_text|join.*chunk.*text|exact" cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py -B 2 -A 2

Length of output: 1962

🏁 Script executed:

#!/bin/bash # Check if there are any tests that do strip() on chunks in text_chunker_with_overlap tests rg -n "\.strip" cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py -B 2 -A 2

Length of output: 0

🏁 Script executed:

#!/bin/bash # Compare exact pattern in both test files echo "=== TextChunker test pattern (line 168-170) ===" && sed -n '168,170p' cognee/tests/unit/modules/chunking/test_text_chunker.py echo "" echo "=== Searching for similar pattern in TextChunkerWithOverlap tests ===" && rg -n "\.join.*chunk" cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py -B 2 -A 2

Length of output: 323

🏁 Script executed:

#!/bin/bash # Check how chunk_by_sentence returns text (with or without spaces) cat cognee/tasks/chunks/chunk_by_sentence.py | head -60

Length of output: 2181

Strip chunks before joining to match TextChunker behavior and ensure correct spacing.

Line 87 joins chunks with a single space without stripping, creating potential double-spaces from sentence boundaries. TextChunker (line 40) explicitly strips each chunk before joining. Update line 87 to:

chunk_text = " ".join(chunk["text"].strip() for chunk in self._accumulated_chunk_data)

Additionally, add a test case to verify exact text reconstruction doesn't break with overlap, similar to test_text_chunker.py lines 168-170.

🤖 Prompt for AI Agents

In cognee/modules/chunking/text_chunker_with_overlap.py around line 87, the code joins chunk["text"] values without stripping, which can produce double spaces at sentence boundaries; change the join to strip each chunk before joining (i.e., use chunk["text"].strip() for each element) so spacing matches TextChunker behavior, and add a unit test similar to test_text_chunker.py lines 168-170 to verify exact text reconstruction with overlaps.

coderabbitai · 2025-11-11T19:15:29Z

cognee/modules/retrieval/cypher_search_retriever.py

-            result = await graph_engine.query(query)
+            result = jsonable_encoder(await graph_engine.query(query))
        except Exception as e:
            logger.error("Failed to execture cypher search retrieval: %s", str(e))


⚠️ Potential issue | 🟡 Minor

Fix typo in error message.

The error message contains a typo: "execture" should be "execute".

Apply this diff:

- logger.error("Failed to execture cypher search retrieval: %s", str(e)) + logger.error("Failed to execute cypher search retrieval: %s", str(e))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.error("Failed to execture cypher search retrieval: %s", str(e))

logger.error("Failed to execute cypher search retrieval: %s", str(e))

🧰 Tools

🪛 Ruff (0.14.4)

57-57: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🤖 Prompt for AI Agents

In cognee/modules/retrieval/cypher_search_retriever.py around line 57, the logger error message has a typo ("execture"); update the message text to "execute" so it reads something like "Failed to execute cypher search retrieval: %s", preserving the existing logger.error call and error interpolation (str(e)).

coderabbitai · 2025-11-11T19:15:29Z

cognee/modules/retrieval/graph_completion_cot_retriever.py

        context: Optional[List[Edge]] = None,
        session_id: Optional[str] = None,
-        max_iter: int = 4,
+        max_iter=4,


⚠️ Potential issue | 🟡 Minor

Add type annotation for max_iter parameter.

The max_iter parameter is missing a type annotation. Based on its default value and usage, it should be int.

Apply this diff:

- max_iter=4, + max_iter: int = 4,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

max_iter=4,

max_iter: int = 4,

🤖 Prompt for AI Agents

In cognee/modules/retrieval/graph_completion_cot_retriever.py around line 172, the parameter declaration "max_iter=4," lacks a type annotation; update the function/method signature to annotate it as an integer (e.g., change to "max_iter: int = 4,") so the parameter is explicitly typed as int.

coderabbitai · 2025-11-11T19:15:29Z

cognee/tests/test_load.py

+async def main():
+    data_directory_path = os.path.join(pathlib.Path(__file__).parent, ".data_storage/test_load")
+    cognee.config.data_root_directory(data_directory_path)
+
+    cognee_directory_path = os.path.join(pathlib.Path(__file__).parent, ".cognee_system/test_load")
+    cognee.config.system_root_directory(cognee_directory_path)
+
+    num_of_pdfs = 10
+    num_of_reps = 5
+    upper_boundary_minutes = 10
+    average_minutes = 8
+
+    recorded_times = []
+    for _ in range(num_of_reps):
+        await cognee.prune.prune_data()
+        await cognee.prune.prune_system(metadata=True)
+
+        s3_input = "s3://cognee-test-load-s3-bucket"
+        await cognee.add(s3_input)
+
+        recorded_times.append(await process_and_search(num_of_pdfs))
+
+    average_recorded_time = sum(recorded_times) / len(recorded_times)
+
+    assert average_recorded_time <= average_minutes * 60
+
+    assert all(rec_time <= upper_boundary_minutes * 60 for rec_time in recorded_times)
+


⚠️ Potential issue | 🟠 Major

Convert this load check into an actual pytest test and drop the live S3 dependency.

Pytest will never execute main() because it does not match the test_* naming convention, so the assertions here never run and the module contributes no coverage. Please move this logic into an async test_* function (decorated with @pytest.mark.asyncio) so the checks execute during CI. In addition, the current implementation calls cognee.add("s3://cognee-test-load-s3-bucket"); our test environment has no AWS credentials, so this will raise on the first run. Replace the live S3 call with a fixture or mock dataset that works offline.

🤖 Prompt for AI Agents

In cognee/tests/test_load.py around lines 32 to 59, the test logic is defined inside an async main() that pytest will never run and it calls a live S3 path; convert this into an async pytest test function and remove the live AWS dependency. Replace main() with an async def test_load_performance(...) decorated with @pytest.mark.asyncio, import pytest, and either use a local test fixture that prepares a mock dataset in the configured data directories or monkeypatch cognee.add to accept a local path instead of "s3://..."; keep the prune/setup, run process_and_search the same number of reps, compute average_recorded_time and assert the time thresholds as before, and ensure any setup/teardown of .data_storage/.cognee_system is handled by fixtures so the test runs offline and is executed by CI.

coderabbitai · 2025-11-11T19:15:30Z

cognee/tests/unit/modules/chunking/test_text_chunker.py

+    assert len(chunks) == 20, (
+        "Should batch paragraphs (2 per chunk: 12 words × 2 tokens = 24, 24 + 1 joiner + 24 = 49)"
+    )


⚠️ Potential issue | 🟠 Major

Replace the Unicode multiplication sign so Ruff passes.

Line [163] uses the × character in the assertion message. Ruff (RUF001) flags this as ambiguous Unicode, and the lint step will fail. Switch to plain ASCII (x) or spell out the multiplication instead.

- assert len(chunks) == 20, ( - "Should batch paragraphs (2 per chunk: 12 words × 2 tokens = 24, 24 + 1 joiner + 24 = 49)" - ) + assert len(chunks) == 20, ( + "Should batch paragraphs (2 per chunk: 12 words x 2 tokens = 24, 24 + 1 joiner + 24 = 49)" + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

assert len(chunks) == 20, (

"Should batch paragraphs (2 per chunk: 12 words × 2 tokens = 24, 24 + 1 joiner + 24 = 49)"

)

assert len(chunks) == 20, (

"Should batch paragraphs (2 per chunk: 12 words x 2 tokens = 24, 24 + 1 joiner + 24 = 49)"

)

🧰 Tools

🪛 Ruff (0.14.4)

163-163: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?

(RUF001)

🤖 Prompt for AI Agents

In cognee/tests/unit/modules/chunking/test_text_chunker.py around lines 162 to 164, the assertion message uses the Unicode multiplication sign "×" which Ruff flags as ambiguous; replace that character with plain ASCII "x" or spell out "times" (e.g., "2 x tokens" or "two times") so the lint rule RUF001 no longer fails and update the assertion message accordingly.

coderabbitai · 2025-11-11T19:15:30Z

cognee/tests/unit/modules/retrieval/structured_output_test.py

+def _assert_string_answer(answer: list[str]):
+    assert isinstance(answer, list), f"Expected str, got {type(answer).__name__}"
+    assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings"
+    assert all(item.strip() for item in answer), "Items should not be empty"


⚠️ Potential issue | 🟡 Minor

Fix assertion message.

Line 32 checks that answer is a list, but the error message says "Expected str". This could be confusing if the assertion fails.

Apply this diff:

def _assert_string_answer(answer: list[str]): - assert isinstance(answer, list), f"Expected str, got {type(answer).__name__}" + assert isinstance(answer, list), f"Expected list, got {type(answer).__name__}" assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings" assert all(item.strip() for item in answer), "Items should not be empty"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _assert_string_answer(answer: list[str]):

assert isinstance(answer, list), f"Expected str, got {type(answer).__name__}"

assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings"

assert all(item.strip() for item in answer), "Items should not be empty"

def _assert_string_answer(answer: list[str]):

assert isinstance(answer, list), f"Expected list, got {type(answer).__name__}"

assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings"

assert all(item.strip() for item in answer), "Items should not be empty"

🤖 Prompt for AI Agents

In cognee/tests/unit/modules/retrieval/structured_output_test.py around lines 31 to 34, the first assertion message incorrectly says "Expected str" while checking for a list; update that assertion to report the correct expected type (e.g., "Expected list, got {type(answer).__name__}") so failures accurately describe the mismatch.

coderabbitai · 2025-11-11T19:15:30Z

cognee/tests/unit/modules/retrieval/structured_output_test.py

+    structured_answer = await retriever.get_completion(
+        "When did Steve start working at Figma??", response_model=TestAnswer
+    )


⚠️ Potential issue | 🟡 Minor

Fix typo: double question mark.

Line 81 has two question marks at the end of the query string.

Apply this diff:

structured_answer = await retriever.get_completion( - "When did Steve start working at Figma??", response_model=TestAnswer + "When did Steve start working at Figma?", response_model=TestAnswer )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

structured_answer = await retriever.get_completion(

"When did Steve start working at Figma??", response_model=TestAnswer

)

structured_answer = await retriever.get_completion(

"When did Steve start working at Figma?", response_model=TestAnswer

)

🤖 Prompt for AI Agents

In cognee/tests/unit/modules/retrieval/structured_output_test.py around lines 80 to 82, the test query string contains a typo with two question marks at the end ("When did Steve start working at Figma??"); change it to a single question mark so the call becomes "When did Steve start working at Figma?" and run tests to ensure no formatting-sensitive assertions fail.

coderabbitai · 2025-11-11T19:15:30Z

examples/python/memify_coding_agent_example.py


    print("Coding rules created by memify:")
-    for coding_rule in coding_rules:
+    for coding_rule in coding_rules[0]["search_result"][0]:


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Add defensive checks for the nested structure access.

The deeply nested access coding_rules[0]["search_result"][0] will raise an IndexError or KeyError if the search returns no results or a different structure. Examples should demonstrate robust handling to prevent crashes when users run them.

Consider adding validation before accessing the nested structure:

- for coding_rule in coding_rules[0]["search_result"][0]: + if coding_rules and len(coding_rules) > 0 and "search_result" in coding_rules[0]: + search_results = coding_rules[0]["search_result"] + if search_results and len(search_results) > 0: + for coding_rule in search_results[0]: + print("- " + coding_rule) + else: + print("No coding rules found.") + return + + for coding_rule in coding_rules[0]["search_result"][0]: print("- " + coding_rule)

Alternatively, verify that the cognee.search() API guarantees this nested structure in all cases by running the following script to check the API implementation and test coverage:

🏁 Script executed:

#!/bin/bash # Verify the structure returned by cognee.search() for CODING_RULES type # Find the search implementation ast-grep --pattern $'async def search($$$): $$$' # Check if tests validate the nested structure for empty results rg -nP --type=py -C3 'SearchType\.CODING_RULES|search.*coding.*rules' -g '!**/memify_coding_agent_example.py'

Length of output: 34798

--- 🏁 Script executed: ```shell #!/bin/bash # Find CodingRulesRetriever class and get_existing_rules method rg -nP --type=py -C10 'class CodingRulesRetriever' -g '!**/test/**'

Length of output: 1908

🏁 Script executed:

#!/bin/bash # Get full CodingRulesRetriever implementation cat cognee/modules/retrieval/coding_rules_retriever.py

Length of output: 1133

🏁 Script executed:

#!/bin/bash # Find the imported get_existing_rules function rg -nP --type=py -C15 'async def get_existing_rules|def get_existing_rules' cognee/tasks/codingagents/coding_rule_associations.py 2>/dev/null || rg -nP --type=py -C15 'get_existing_rules' cognee/tasks/codingagents/ -g '*.py' | head -80

Length of output: 1011

🏁 Script executed:

#!/bin/bash # Find the cognee.search() function that wraps these retrievers rg -nP --type=py -C20 'async def search\(' cognee/api/v1/search/ -g '*.py' | head -100

Length of output: 7109

🏁 Script executed:

#!/bin/bash # Find the search function in cognee.modules.search.methods rg -nP --type=py -C30 'async def search\(' cognee/modules/search/methods/ -g '*.py' | head -150

Length of output: 4601

🏁 Script executed:

#!/bin/bash # Also check what structure is returned for CODING_RULES type rg -nP --type=py -B5 -A20 'SearchType\.CODING_RULES' cognee/modules/search/methods/ -g '*.py'

Length of output: 2424

🏁 Script executed:

#!/bin/bash # Get the full search function implementation cat cognee/modules/search/methods/search.py

Length of output: 15006

🏁 Script executed:

#!/bin/bash # Find prepare_search_result implementation cat cognee/modules/search/utils/prepare_search_result.py

Length of output: 2333

Add defensive checks for the nested structure access.

The deeply nested access coding_rules[0]["search_result"][0] on line 92 will raise an IndexError or iterate over unintended data if the search returns no results or a single result. Examples should demonstrate robust handling to prevent crashes when users run them.

The risks are:

Empty results cause IndexError on [0] access

Single rule result gets wrapped as a string, causing character-by-character iteration

No validation at any nesting level

Consider adding defensive checks:

if coding_rules and len(coding_rules) > 0 and "search_result" in coding_rules[0]: rules = coding_rules[0]["search_result"] if rules and len(rules) > 0: for coding_rule in rules[0]: print("- " + coding_rule)

Alternatively, verify the cognee.search() API contract guarantees this nested dict structure in all cases (including empty results) before relying on it in the example.

🤖 Prompt for AI Agents

In examples/python/memify_coding_agent_example.py around line 92, the loop uses coding_rules[0]["search_result"][0] without validating the nested structure which can raise IndexError or iterate characters if a single string is returned; add defensive checks to ensure coding_rules is non-empty, that coding_rules[0] contains "search_result" and that its value is a non-empty list (and that the first element is iterable of rules), then iterate over the validated rules; alternatively normalize the API output into a list before looping (e.g., coerce single-item/string results to a list) and log or handle the empty-case gracefully so the example never crashes.

pazone · 2025-11-12T08:52:01Z

Hi @mohitk-patwari . Thank you for the contribution. Before we start the deep review, could you please resolve the code rabbit issues and decide on these 2 large JSON files you've added. Do we need them in our codebase?

dexters1 · 2025-11-12T10:53:27Z

cognee/modules/pipelines/models/DataItem.py

+from cognee.modules.pipelines.models.DataItemStatus import DataItemStatus
+
+@dataclass
+class DataItem:


The cognee.add function needs to be able to take this as input for data and add name/label information from this class to the relational database for the Data table row as well

2wikimultihop_dev.json

dexters1 · 2025-11-12T15:37:12Z

I've changed the PR to draft state, return it to ready for review state when Data in this Dataclass can be ingested with Cognee.add and label information is in the SQL database.

We'll review it again at that time

…, and API integration — verified ORM operations, Alembic upgrades, and end-to-end consistency via verify_db.py

gitguardian · 2025-12-01T15:57:10Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
9573981	Triggered	Generic Password	`d724a58`	.github/workflows/e2e_tests.yml	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

borisarzentar · 2025-12-01T15:59:12Z

cognee/tests/unit/modules/pipelines/test_data_item_label.py

+        status=DataItemStatus.DATA_ITEM_PROCESSING_COMPLETED,
+        label="Important"
+    )
+    assert item.label == "Important"


Suggestion for a more comprehensive test, add this data with cognee.add and then check what label it has in the relational database.

borisarzentar · 2025-12-01T16:01:01Z

cognee/api/v1/add/add.py

    preferred_loaders: Optional[List[Union[str, dict[str, dict[str, Any]]]]] = None,
    incremental_loading: bool = True,
    data_per_batch: Optional[int] = 20,
+    label: Optional[str] = None,


If we add DataItem items, then we can read directly from them, we don't need additional label here.

dexters1 · 2025-12-11T17:20:44Z

Closed due to inactivity

github-actions bot reviewed Nov 11, 2025

View reviewed changes

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

Vasilije1990 changed the base branch from main to dev November 11, 2025 20:03

dexters1 reviewed Nov 12, 2025

View reviewed changes

2wikimultihop_dev.json Outdated Show resolved Hide resolved

pazone self-requested a review November 12, 2025 12:44

chore: remove large JSON test files from tracking

4f58cec

mohitk-patwari force-pushed the feature/add-label-column branch from 763a05d to 4f58cec Compare November 12, 2025 12:54

dexters1 marked this pull request as draft November 12, 2025 15:34

feat(data-model): add 'label' field with Alembic migration, ingestion…

d7a93ca

…, and API integration — verified ORM operations, Alembic upgrades, and end-to-end consistency via verify_db.py

Vasilije1990 added the community-contribution Community contribution label label Nov 19, 2025

Merge branch 'dev' into feature/add-label-column

d724a58

borisarzentar requested changes Dec 1, 2025

View reviewed changes

sh0116 mentioned this pull request Dec 3, 2025

🎅 I WISH COGNEE HAD... #1270

Closed

dexters1 closed this Dec 11, 2025

	logger.error("Failed to execture cypher search retrieval: %s", str(e))
	logger.error("Failed to execute cypher search retrieval: %s", str(e))

feat: added label field to DataItem dataclass #1778

feat: added label field to DataItem dataclass #1778

Conversation

mohitk-patwari commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Screenshots/Videos (if applicable)

Pre-submission Checklist

DCO Affirmation

Uh oh!

pull-checklist bot commented Nov 11, 2025

Please make sure all the checkboxes are checked:

Uh oh!

coderabbitai bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

pazone commented Nov 12, 2025

Uh oh!

dexters1 Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dexters1 commented Nov 12, 2025

Uh oh!

gitguardian bot commented Dec 1, 2025

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

borisarzentar Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

borisarzentar Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

mohitk-patwari commented Nov 11, 2025 •

edited

Loading

coderabbitai bot commented Nov 11, 2025 •

edited

Loading