-
Notifications
You must be signed in to change notification settings - Fork 967
feat: added label field to DataItem dataclass #1778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: added label field to DataItem dataclass #1778
Conversation
Please make sure all the checkboxes are checked:
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughA multi-faceted update introducing backend access control configuration, refactoring the retrieval system to support structured outputs via response_model parameter, adding session persistence pipelines, implementing chunking with overlap, extending Edge metadata with edge_text, and updating numerous tests and examples to align with these changes. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Config as backend_access_control_enabled()
participant Env as Environment
participant DBConfig as Graph/Vector DB Config
participant Flow as Feature Logic
App->>Config: Check if backend access control is enabled
Config->>Env: Read ENABLE_BACKEND_ACCESS_CONTROL
alt ENABLE_BACKEND_ACCESS_CONTROL set
Config->>DBConfig: Call get_graph_context_config() & get_vectordb_context_config()
DBConfig-->>Config: Return provider info
Config->>Config: Validate multi-user support (kuzu, lancedb)
alt Providers supported
Config-->>App: Return True
else Unsupported providers
Config-->>App: Raise EnvironmentError
end
else ENABLE_BACKEND_ACCESS_CONTROL not set
Config->>Config: Call multi_user_support_possible()
alt Multi-user support possible
Config-->>App: Return True
else Not possible
Config-->>App: Return False
end
end
Flow->>App: Apply access control or skip
sequenceDiagram
participant Client as Client
participant Pipeline as persist_sessions_in_knowledge_graph_pipeline
participant Context as Context Setup
participant Extract as extract_user_sessions
participant Cognify as cognify_session
participant Memify as memify
Client->>Pipeline: Call with user, session_ids, dataset
Pipeline->>Context: set_session_user_context_variable(user)
Pipeline->>Context: Retrieve & validate dataset write access
Pipeline->>Context: set_database_global_context_variables(dataset.id, owner_id)
Pipeline->>Extract: Build extraction task
Extract-->>Pipeline: Generator yields formatted QA data
Pipeline->>Cognify: Build enrichment task (cognify_session)
Pipeline->>Memify: Invoke with tasks, dataset, data
Memify-->>Pipeline: Process & return result
Pipeline-->>Client: Return memify result
sequenceDiagram
participant Caller as Caller
participant Retriever as Retriever (get_completion)
participant Completion as generate_completion()
participant LLM as LLM Engine
Caller->>Retriever: get_completion(query, context, session_id, response_model=MyModel)
Retriever->>Completion: Call with response_model
Completion->>LLM: Request generation with structured output (MyModel)
LLM-->>Completion: Return structured response
Completion-->>Retriever: Return response
Retriever->>Retriever: Wrap in List[Any]
Retriever-->>Caller: Return [response] or [] based on result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Key areas requiring careful attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @mohitk-patwari, thank you for submitting a PR! We will respond as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
cognee/tests/test_edge_ingestion.py (1)
14-14: Missingpytest.mark.asynciodecorator.The test function is async but lacks the
@pytest.mark.asynciodecorator. As per coding guidelines, async tests should use this decorator to ensure proper execution by pytest.Apply this diff:
+import pytest + +@pytest.mark.asyncio async def test_edge_ingestion():cognee/modules/retrieval/graph_completion_retriever.py (1)
205-208: Ensure saved interactions serialize structured responsesWhen
response_modelreturns a non-string object (dataclass/Pydantic), the code still forwards that object tosave_qa, which concatenates it withquestionat Line 234. That raisesTypeError: can only concatenate str (not "<type>") to str, so any caller enabling structured outputs whilesave_interaction=Truewill crash. Convert completions to strings (or another serializable form) before saving.Apply this diff to preserve existing behavior for strings while safely handling structured outputs:
@@ - if self.save_interaction and context and triplets and completion: - await self.save_qa( - question=query, answer=completion, context=context_text, triplets=triplets - ) + if self.save_interaction and context and triplets and completion: + serialized_completion = ( + completion if isinstance(completion, str) else str(completion) + ) + await self.save_qa( + question=query, + answer=serialized_completion, + context=context_text, + triplets=triplets, + )cognee/modules/retrieval/graph_completion_cot_retriever.py (1)
167-199: Fix docstring inconsistencies.The docstring has two inconsistencies with the actual method signature:
- Line 188: Documents
contextasOptional[Any], but the signature specifiesOptional[List[Edge]]- Line 199: Documents return type as
List[str], but the signature returnsList[Any]Apply this diff to fix the docstring:
- - context (Optional[Any]): Optional context that may assist in answering the query. + - context (Optional[List[Edge]]): Optional context that may assist in answering the query. If not provided, it will be fetched based on the query. (default None) - session_id (Optional[str]): Optional session identifier for caching. If None, defaults to 'default_session'. (default None) - max_iter: The maximum number of iterations to refine the answer and generate follow-up questions. (default 4) - response_model (Type): The Pydantic model type for structured output. (default str) Returns: -------- - - List[str]: A list containing the generated answer to the user's query. + - List[Any]: A list containing the generated answer to the user's query.
🧹 Nitpick comments (8)
cognee/modules/pipelines/models/DataItem.py (1)
1-4: Consider using relative import for consistency.While the absolute import on line 3 is correct, the module's
__init__.pyuses relative imports (e.g.,from .DataItemStatus import DataItemStatus). For consistency with the existing codebase pattern, consider using a relative import.Apply this diff if you prefer to align with the existing import style:
-from cognee.modules.pipelines.models.DataItemStatus import DataItemStatus +from .DataItemStatus import DataItemStatuscognee/tests/unit/modules/pipelines/test_data_item_label.py (1)
4-12: Consider adding test coverage for the default label value.The test correctly validates the label field when explicitly provided. To ensure comprehensive coverage of the new optional field, consider adding a test case that verifies the default behavior (label=None when not provided).
Add this test function to verify the default case:
def test_data_item_label_field_default(): item = DataItem( id="124", name="Sample Item Without Label", source="mock_source", status=DataItemStatus.DATA_ITEM_PROCESSING_COMPLETED ) assert item.label is Nonecognee/modules/retrieval/cypher_search_retriever.py (1)
55-55: Remove unnecessaryjsonable_encoderwrapper for query results.Both graph backends return JSON-serializable types:
- Neo4j adapter returns
List[Dict[str, Any]]directly fromresult.data()- Kuzu adapter returns
List[Tuple]with scalar valuesSince the query results are already JSON-serializable,
jsonable_encoderadds unnecessary overhead:result = await graph_engine.query(query)Remove the
jsonable_encoderwrapper at line 55 unless query results contain Pydantic models or custom objects requiring special encoding.cognee/tasks/feedback/generate_improved_answers.py (1)
72-76: Consider defensive attribute access forcompletion[0].While
response_model=ImprovedAnswerResponseshould ensure the correct type, accessingcompletion[0].answerandcompletion[0].explanationwithout verification could raiseAttributeErrorif the response_model contract is not met. Consider wrapping these accesses in a try-except block or adding a type check.Apply this diff to add defensive handling:
if completion: - enrichment.improved_answer = completion[0].answer + try: + enrichment.improved_answer = completion[0].answer + enrichment.explanation = completion[0].explanation + except (AttributeError, IndexError) as e: + logger.warning( + "Unexpected completion format", + error=str(e), + question=enrichment.question, + ) + return None enrichment.new_context = new_context_text - enrichment.explanation = completion[0].explanation return enrichmentcognee/tests/test_search_db.py (1)
150-155: Consider moving import to module level.The import of
backend_access_control_enabledis placed inside the loop. While this works, moving it to the module level (alongside other imports) would improve readability and follow standard Python conventions.Apply this diff:
from cognee.modules.search.types import SearchType from collections import Counter +from cognee.context_global_variables import backend_access_control_enabled logger = get_logger() ... for name, search_results in [ ... ]: assert isinstance(search_results, list), f"{name}: should return a list" assert len(search_results) == 1, ( f"{name}: expected single-element list, got {len(search_results)}" ) - from cognee.context_global_variables import backend_access_control_enabled - if backend_access_control_enabled(): text = search_results[0]["search_result"][0] else: text = search_results[0]cognee/tasks/memify/cognify_session.py (1)
40-41: Improve exception handling following Python best practices.The error handling can be improved in two ways:
- Use
logger.exception()instead oflogger.error()to automatically include the traceback- Chain the exception with
raise ... from eto preserve the exception contextApply this diff:
- logger.error(f"Error cognifying session data: {str(e)}") - raise CogneeSystemError(message=f"Failed to cognify session data: {str(e)}", log=False) + logger.exception("Error cognifying session data") + raise CogneeSystemError(message=f"Failed to cognify session data: {str(e)}", log=False) from ecognee/modules/retrieval/EntityCompletionRetriever.py (1)
114-116: Docstring return type should match the new signature
Line 114 still documentsList[str], but the method now returnsList[Any]when a structuredresponse_modelis supplied. Please align the docstring with the actual return shape so downstream callers aren’t misled.cognee/modules/retrieval/graph_completion_context_extension_retriever.py (1)
85-87: Docstring return type should reflect structured responses
Line 85 still advertises aList[str], but the signature now returnsList[Any]to accommodate customresponse_modeltypes. Please update the docstring so it mirrors the broadened return type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (9)
.github/workflows/examples_tests.ymlis excluded by!**/*.yml.github/workflows/load_tests.ymlis excluded by!**/*.yml.github/workflows/release_test.ymlis excluded by!**/*.yml.github/workflows/search_db_tests.ymlis excluded by!**/*.yml.github/workflows/test_different_operating_systems.ymlis excluded by!**/*.yml.github/workflows/test_suites.ymlis excluded by!**/*.yml2wikimultihop_dev.jsonis excluded by!**/*.jsondocker-compose.ymlis excluded by!**/*.ymlhotpot_benchmark.jsonis excluded by!**/*.json
📒 Files selected for processing (68)
.env.template(1 hunks)cognee/__init__.py(1 hunks)cognee/api/client.py(2 hunks)cognee/context_global_variables.py(3 hunks)cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py(1 hunks)cognee/infrastructure/databases/vector/create_vector_engine.py(1 hunks)cognee/infrastructure/engine/models/Edge.py(2 hunks)cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py(1 hunks)cognee/modules/chunking/models/DocumentChunk.py(2 hunks)cognee/modules/chunking/text_chunker_with_overlap.py(1 hunks)cognee/modules/graph/cognee_graph/CogneeGraph.py(1 hunks)cognee/modules/graph/utils/expand_with_nodes_and_edges.py(2 hunks)cognee/modules/graph/utils/resolve_edges_to_text.py(1 hunks)cognee/modules/pipelines/models/DataItem.py(1 hunks)cognee/modules/pipelines/models/__init__.py(1 hunks)cognee/modules/retrieval/EntityCompletionRetriever.py(5 hunks)cognee/modules/retrieval/base_graph_retriever.py(2 hunks)cognee/modules/retrieval/base_retriever.py(2 hunks)cognee/modules/retrieval/completion_retriever.py(6 hunks)cognee/modules/retrieval/cypher_search_retriever.py(2 hunks)cognee/modules/retrieval/graph_completion_context_extension_retriever.py(5 hunks)cognee/modules/retrieval/graph_completion_cot_retriever.py(4 hunks)cognee/modules/retrieval/graph_completion_retriever.py(3 hunks)cognee/modules/retrieval/temporal_retriever.py(4 hunks)cognee/modules/retrieval/utils/brute_force_triplet_search.py(1 hunks)cognee/modules/retrieval/utils/completion.py(2 hunks)cognee/modules/run_custom_pipeline/__init__.py(1 hunks)cognee/modules/run_custom_pipeline/run_custom_pipeline.py(1 hunks)cognee/modules/search/methods/search.py(3 hunks)cognee/modules/users/methods/get_authenticated_user.py(1 hunks)cognee/modules/users/methods/get_default_user.py(1 hunks)cognee/tasks/feedback/generate_improved_answers.py(2 hunks)cognee/tasks/memify/__init__.py(1 hunks)cognee/tasks/memify/cognify_session.py(1 hunks)cognee/tasks/memify/extract_user_sessions.py(1 hunks)cognee/tasks/storage/index_data_points.py(1 hunks)cognee/tasks/storage/index_graph_edges.py(3 hunks)cognee/tests/test_add_docling_document.py(1 hunks)cognee/tests/test_conversation_history.py(3 hunks)cognee/tests/test_edge_ingestion.py(1 hunks)cognee/tests/test_feedback_enrichment.py(1 hunks)cognee/tests/test_library.py(1 hunks)cognee/tests/test_load.py(1 hunks)cognee/tests/test_relational_db_migration.py(1 hunks)cognee/tests/test_search_db.py(1 hunks)cognee/tests/unit/api/test_conditional_authentication_endpoints.py(6 hunks)cognee/tests/unit/infrastructure/databases/test_index_data_points.py(1 hunks)cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py(2 hunks)cognee/tests/unit/modules/chunking/test_text_chunker.py(1 hunks)cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py(1 hunks)cognee/tests/unit/modules/memify_tasks/test_cognify_session.py(1 hunks)cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py(1 hunks)cognee/tests/unit/modules/pipelines/test_data_item_label.py(1 hunks)cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py(0 hunks)cognee/tests/unit/modules/retrieval/rag_completion_retriever_test.py(1 hunks)cognee/tests/unit/modules/retrieval/structured_output_test.py(1 hunks)cognee/tests/unit/modules/retrieval/summaries_retriever_test.py(1 hunks)cognee/tests/unit/modules/retrieval/temporal_retriever_test.py(0 hunks)cognee/tests/unit/modules/users/test_conditional_authentication.py(0 hunks)entrypoint.sh(1 hunks)examples/python/agentic_reasoning_procurement_example.py(1 hunks)examples/python/code_graph_example.py(2 hunks)examples/python/conversation_session_persistence_example.py(1 hunks)examples/python/feedback_enrichment_minimal_example.py(0 hunks)examples/python/memify_coding_agent_example.py(1 hunks)examples/python/relational_database_migration_example.py(1 hunks)examples/python/run_custom_pipeline_example.py(1 hunks)examples/python/simple_example.py(0 hunks)
💤 Files with no reviewable changes (5)
- examples/python/feedback_enrichment_minimal_example.py
- cognee/tests/unit/modules/retrieval/temporal_retriever_test.py
- examples/python/simple_example.py
- cognee/tests/unit/modules/users/test_conditional_authentication.py
- cognee/tests/unit/modules/retrieval/graph_completion_retriever_cot_test.py
🧰 Additional context used
📓 Path-based instructions (5)
{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
{cognee,cognee-mcp,distributed,examples,alembic}/**/*.py: Use 4-space indentation; name modules and functions in snake_case; name classes in PascalCase (Python)
Adhere to ruff rules, including import hygiene and configured line length (100)
Keep Python lines ≤ 100 characters
Files:
examples/python/code_graph_example.pycognee/tests/unit/modules/retrieval/summaries_retriever_test.pycognee/modules/run_custom_pipeline/run_custom_pipeline.pycognee/modules/graph/cognee_graph/CogneeGraph.pycognee/tests/test_library.pycognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.pycognee/modules/pipelines/models/DataItem.pycognee/tests/test_add_docling_document.pycognee/modules/search/methods/search.pycognee/context_global_variables.pycognee/tasks/feedback/generate_improved_answers.pycognee/tests/unit/modules/retrieval/rag_completion_retriever_test.pycognee/tests/test_search_db.pycognee/tasks/storage/index_graph_edges.pycognee/tasks/storage/index_data_points.pycognee/tests/unit/modules/chunking/test_text_chunker.pycognee/modules/run_custom_pipeline/__init__.pycognee/tests/test_relational_db_migration.pycognee/modules/pipelines/models/__init__.pycognee/modules/retrieval/graph_completion_retriever.pycognee/tests/unit/infrastructure/databases/test_index_data_points.pycognee/tasks/memify/extract_user_sessions.pyexamples/python/run_custom_pipeline_example.pycognee/api/client.pycognee/modules/retrieval/completion_retriever.pycognee/modules/users/methods/get_default_user.pycognee/tests/unit/modules/memify_tasks/test_cognify_session.pycognee/tasks/memify/cognify_session.pycognee/modules/chunking/models/DocumentChunk.pycognee/__init__.pyexamples/python/conversation_session_persistence_example.pycognee/infrastructure/engine/models/Edge.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/tests/test_feedback_enrichment.pycognee/modules/retrieval/graph_completion_context_extension_retriever.pyexamples/python/memify_coding_agent_example.pycognee/modules/graph/utils/expand_with_nodes_and_edges.pycognee/infrastructure/databases/vector/create_vector_engine.pycognee/tasks/memify/__init__.pycognee/modules/retrieval/base_graph_retriever.pycognee/modules/chunking/text_chunker_with_overlap.pycognee/modules/retrieval/temporal_retriever.pycognee/tests/test_load.pycognee/modules/retrieval/base_retriever.pycognee/modules/retrieval/cypher_search_retriever.pycognee/memify_pipelines/persist_sessions_in_knowledge_graph.pycognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.pycognee/tests/unit/api/test_conditional_authentication_endpoints.pyexamples/python/agentic_reasoning_procurement_example.pycognee/tests/unit/infrastructure/databases/test_index_graph_edges.pycognee/tests/test_conversation_history.pycognee/modules/graph/utils/resolve_edges_to_text.pycognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.pycognee/tests/test_edge_ingestion.pyexamples/python/relational_database_migration_example.pycognee/modules/retrieval/utils/completion.pycognee/modules/users/methods/get_authenticated_user.pycognee/modules/retrieval/EntityCompletionRetriever.pycognee/tests/unit/modules/retrieval/structured_output_test.pycognee/tests/unit/modules/pipelines/test_data_item_label.pycognee/modules/retrieval/graph_completion_cot_retriever.py
examples/python/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
When adding public APIs, provide or update targeted examples under examples/python/
Files:
examples/python/code_graph_example.pyexamples/python/run_custom_pipeline_example.pyexamples/python/conversation_session_persistence_example.pyexamples/python/memify_coding_agent_example.pyexamples/python/agentic_reasoning_procurement_example.pyexamples/python/relational_database_migration_example.py
cognee/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/**/*.py: Public APIs in the core library should be type-annotated where practical
Prefer explicit, structured error handling and use shared logging utilities from cognee.shared.logging_utils
Files:
cognee/tests/unit/modules/retrieval/summaries_retriever_test.pycognee/modules/run_custom_pipeline/run_custom_pipeline.pycognee/modules/graph/cognee_graph/CogneeGraph.pycognee/tests/test_library.pycognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.pycognee/modules/pipelines/models/DataItem.pycognee/tests/test_add_docling_document.pycognee/modules/search/methods/search.pycognee/context_global_variables.pycognee/tasks/feedback/generate_improved_answers.pycognee/tests/unit/modules/retrieval/rag_completion_retriever_test.pycognee/tests/test_search_db.pycognee/tasks/storage/index_graph_edges.pycognee/tasks/storage/index_data_points.pycognee/tests/unit/modules/chunking/test_text_chunker.pycognee/modules/run_custom_pipeline/__init__.pycognee/tests/test_relational_db_migration.pycognee/modules/pipelines/models/__init__.pycognee/modules/retrieval/graph_completion_retriever.pycognee/tests/unit/infrastructure/databases/test_index_data_points.pycognee/tasks/memify/extract_user_sessions.pycognee/api/client.pycognee/modules/retrieval/completion_retriever.pycognee/modules/users/methods/get_default_user.pycognee/tests/unit/modules/memify_tasks/test_cognify_session.pycognee/tasks/memify/cognify_session.pycognee/modules/chunking/models/DocumentChunk.pycognee/__init__.pycognee/infrastructure/engine/models/Edge.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/tests/test_feedback_enrichment.pycognee/modules/retrieval/graph_completion_context_extension_retriever.pycognee/modules/graph/utils/expand_with_nodes_and_edges.pycognee/infrastructure/databases/vector/create_vector_engine.pycognee/tasks/memify/__init__.pycognee/modules/retrieval/base_graph_retriever.pycognee/modules/chunking/text_chunker_with_overlap.pycognee/modules/retrieval/temporal_retriever.pycognee/tests/test_load.pycognee/modules/retrieval/base_retriever.pycognee/modules/retrieval/cypher_search_retriever.pycognee/memify_pipelines/persist_sessions_in_knowledge_graph.pycognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.pycognee/tests/unit/api/test_conditional_authentication_endpoints.pycognee/tests/unit/infrastructure/databases/test_index_graph_edges.pycognee/tests/test_conversation_history.pycognee/modules/graph/utils/resolve_edges_to_text.pycognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.pycognee/tests/test_edge_ingestion.pycognee/modules/retrieval/utils/completion.pycognee/modules/users/methods/get_authenticated_user.pycognee/modules/retrieval/EntityCompletionRetriever.pycognee/tests/unit/modules/retrieval/structured_output_test.pycognee/tests/unit/modules/pipelines/test_data_item_label.pycognee/modules/retrieval/graph_completion_cot_retriever.py
cognee/tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Place unit tests under cognee/tests/unit/
Files:
cognee/tests/unit/modules/retrieval/summaries_retriever_test.pycognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.pycognee/tests/unit/modules/retrieval/rag_completion_retriever_test.pycognee/tests/unit/modules/chunking/test_text_chunker.pycognee/tests/unit/infrastructure/databases/test_index_data_points.pycognee/tests/unit/modules/memify_tasks/test_cognify_session.pycognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.pycognee/tests/unit/api/test_conditional_authentication_endpoints.pycognee/tests/unit/infrastructure/databases/test_index_graph_edges.pycognee/tests/unit/modules/retrieval/structured_output_test.pycognee/tests/unit/modules/pipelines/test_data_item_label.py
cognee/tests/**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
cognee/tests/**/test_*.py: Name test files as test_*.py
Use pytest.mark.asyncio for async tests
Tests should avoid external state; rely on fixtures and CI-provided env vars when providers are required
Files:
cognee/tests/test_library.pycognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.pycognee/tests/test_add_docling_document.pycognee/tests/test_search_db.pycognee/tests/unit/modules/chunking/test_text_chunker.pycognee/tests/test_relational_db_migration.pycognee/tests/unit/infrastructure/databases/test_index_data_points.pycognee/tests/unit/modules/memify_tasks/test_cognify_session.pycognee/tests/test_feedback_enrichment.pycognee/tests/test_load.pycognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.pycognee/tests/unit/api/test_conditional_authentication_endpoints.pycognee/tests/unit/infrastructure/databases/test_index_graph_edges.pycognee/tests/test_conversation_history.pycognee/tests/test_edge_ingestion.pycognee/tests/unit/modules/pipelines/test_data_item_label.py
🧠 Learnings (8)
📚 Learning: 2024-11-13T16:17:17.646Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraphElements.py:82-90
Timestamp: 2024-11-13T16:17:17.646Z
Learning: In `cognee/modules/graph/cognee_graph/CogneeGraphElements.py`, within the `Edge` class, nodes and edges can have different dimensions, and it's acceptable for them not to match.
Applied to files:
cognee/modules/graph/cognee_graph/CogneeGraph.pycognee/modules/chunking/models/DocumentChunk.pycognee/modules/retrieval/graph_completion_context_extension_retriever.pycognee/modules/graph/utils/expand_with_nodes_and_edges.pycognee/modules/retrieval/base_graph_retriever.py
📚 Learning: 2024-11-13T16:06:32.576Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 196
File: cognee/modules/graph/cognee_graph/CogneeGraph.py:32-38
Timestamp: 2024-11-13T16:06:32.576Z
Learning: In `CogneeGraph.py`, within the `CogneeGraph` class, it's intentional to add skeleton edges in both the `add_edge` method and the `project_graph_from_db` method to ensure that edges are added to the graph and to the nodes.
Applied to files:
cognee/modules/graph/cognee_graph/CogneeGraph.pycognee/modules/retrieval/utils/brute_force_triplet_search.pycognee/modules/graph/utils/expand_with_nodes_and_edges.py
📚 Learning: 2024-11-13T14:55:05.912Z
Learnt from: 0xideas
Repo: topoteretes/cognee PR: 205
File: cognee/tests/unit/processing/chunks/chunk_by_paragraph_test.py:7-7
Timestamp: 2024-11-13T14:55:05.912Z
Learning: When changes are made to the chunking implementation in `cognee/tasks/chunks`, the ground truth values in the corresponding tests in `cognee/tests/unit/processing/chunks` need to be updated accordingly.
Applied to files:
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.pycognee/tests/unit/modules/chunking/test_text_chunker.pycognee/modules/chunking/models/DocumentChunk.pycognee/modules/chunking/text_chunker_with_overlap.py
📚 Learning: 2024-12-04T18:37:55.092Z
Learnt from: hajdul88
Repo: topoteretes/cognee PR: 251
File: cognee/tests/infrastructure/databases/test_index_graph_edges.py:0-0
Timestamp: 2024-12-04T18:37:55.092Z
Learning: In the `index_graph_edges` function, both graph engine and vector engine initialization failures are handled within the same try-except block, so a single test covers both cases.
Applied to files:
cognee/tasks/storage/index_graph_edges.pycognee/tests/unit/infrastructure/databases/test_index_data_points.pycognee/tests/unit/infrastructure/databases/test_index_graph_edges.py
📚 Learning: 2025-10-11T04:18:24.594Z
Learnt from: Vattikuti-Manideep-Sitaram
Repo: topoteretes/cognee PR: 1529
File: cognee/api/v1/cognify/ontology_graph_pipeline.py:69-74
Timestamp: 2025-10-11T04:18:24.594Z
Learning: The code_graph_pipeline.py and ontology_graph_pipeline.py both follow an established pattern of calling cognee.prune.prune_data() and cognee.prune.prune_system(metadata=True) at the start of pipeline execution. This appears to be intentional behavior for pipeline operations in the cognee codebase.
Applied to files:
examples/python/run_custom_pipeline_example.py
📚 Learning: 2025-10-27T09:21:14.154Z
Learnt from: CR
Repo: topoteretes/cognee PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-27T09:21:14.154Z
Learning: Applies to cognee/tests/unit/**/*.py : Place unit tests under cognee/tests/unit/
Applied to files:
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py
📚 Learning: 2025-10-27T09:21:14.154Z
Learnt from: CR
Repo: topoteretes/cognee PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-27T09:21:14.154Z
Learning: Applies to cognee/tests/**/test_*.py : Use pytest.mark.asyncio for async tests
Applied to files:
cognee/tests/unit/modules/memify_tasks/test_cognify_session.pycognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py
📚 Learning: 2025-10-27T09:21:14.154Z
Learnt from: CR
Repo: topoteretes/cognee PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-10-27T09:21:14.154Z
Learning: Applies to cognee/tests/**/test_*.py : Tests should avoid external state; rely on fixtures and CI-provided env vars when providers are required
Applied to files:
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
🧬 Code graph analysis (38)
cognee/modules/run_custom_pipeline/run_custom_pipeline.py (3)
cognee/shared/logging_utils.py (1)
get_logger(212-224)cognee/modules/pipelines/tasks/task.py (1)
Task(5-97)cognee/modules/pipelines/layers/pipeline_execution_mode.py (1)
get_pipeline_executor(117-127)
cognee/tests/test_library.py (1)
cognee/modules/search/types/SearchType.py (1)
SearchType(4-19)
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py (2)
cognee/modules/chunking/text_chunker_with_overlap.py (2)
TextChunkerWithOverlap(11-124)read(112-124)cognee/tasks/chunks/chunk_by_paragraph.py (1)
chunk_by_paragraph(7-96)
cognee/modules/pipelines/models/DataItem.py (1)
cognee/modules/pipelines/models/DataItemStatus.py (1)
DataItemStatus(4-5)
cognee/modules/search/methods/search.py (1)
cognee/context_global_variables.py (1)
backend_access_control_enabled(36-50)
cognee/context_global_variables.py (2)
cognee/infrastructure/databases/vector/config.py (1)
get_vectordb_context_config(84-90)cognee/infrastructure/databases/graph/config.py (1)
get_graph_context_config(140-148)
cognee/tasks/feedback/generate_improved_answers.py (2)
cognee/modules/retrieval/EntityCompletionRetriever.py (1)
get_completion(87-165)cognee/modules/retrieval/completion_retriever.py (1)
get_completion(77-147)
cognee/tests/test_search_db.py (1)
cognee/context_global_variables.py (1)
backend_access_control_enabled(36-50)
cognee/tasks/storage/index_graph_edges.py (5)
cognee/modules/engine/utils/generate_edge_id.py (1)
generate_edge_id(4-5)cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/tasks/storage/index_data_points.py (1)
index_data_points(10-65)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (1)
index_data_points(251-263)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (1)
index_data_points(297-319)
cognee/tasks/storage/index_data_points.py (6)
cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine(5-7)cognee/infrastructure/databases/vector/lancedb/LanceDBAdapter.py (2)
create_vector_index(292-295)index_data_points(297-309)cognee/infrastructure/databases/vector/pgvector/PGVectorAdapter.py (2)
create_vector_index(248-249)index_data_points(251-263)cognee/infrastructure/databases/vector/chromadb/ChromaDBAdapter.py (2)
create_vector_index(285-295)index_data_points(297-319)cognee/infrastructure/databases/vector/embeddings/OllamaEmbeddingEngine.py (1)
get_batch_size(140-147)cognee/infrastructure/databases/vector/embeddings/EmbeddingEngine.py (1)
get_batch_size(38-45)
cognee/tests/unit/modules/chunking/test_text_chunker.py (3)
cognee/modules/chunking/TextChunker.py (1)
TextChunker(11-78)cognee/modules/chunking/text_chunker_with_overlap.py (2)
TextChunkerWithOverlap(11-124)read(112-124)cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py (4)
make_text_generator(14-24)_factory(17-22)_factory(31-43)gen(18-20)
cognee/modules/run_custom_pipeline/__init__.py (1)
cognee/modules/run_custom_pipeline/run_custom_pipeline.py (1)
run_custom_pipeline(14-69)
cognee/modules/pipelines/models/__init__.py (1)
cognee/modules/pipelines/models/DataItem.py (1)
DataItem(6-11)
cognee/tests/unit/infrastructure/databases/test_index_data_points.py (2)
cognee/tasks/storage/index_data_points.py (1)
index_data_points(10-65)cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(20-220)
cognee/tasks/memify/extract_user_sessions.py (3)
cognee/exceptions/exceptions.py (1)
CogneeSystemError(38-49)cognee/infrastructure/databases/cache/get_cache_engine.py (1)
get_cache_engine(54-67)cognee/shared/logging_utils.py (2)
get_logger(212-224)info(205-205)
examples/python/run_custom_pipeline_example.py (8)
cognee/modules/users/methods/get_default_user.py (1)
get_default_user(13-36)cognee/shared/logging_utils.py (1)
setup_logging(288-555)cognee/modules/pipelines/tasks/task.py (1)
Task(5-97)cognee/modules/search/types/SearchType.py (1)
SearchType(4-19)cognee/tasks/ingestion/ingest_data.py (1)
ingest_data(25-199)cognee/tasks/ingestion/resolve_data_directories.py (1)
resolve_data_directories(10-84)cognee/modules/run_custom_pipeline/run_custom_pipeline.py (1)
run_custom_pipeline(14-69)cognee/api/v1/cognify/cognify.py (1)
get_default_tasks(246-297)
cognee/api/client.py (1)
cognee/shared/logging_utils.py (2)
setup_logging(288-555)info(205-205)
cognee/modules/users/methods/get_default_user.py (1)
cognee/modules/users/models/User.py (1)
User(13-40)
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py (2)
cognee/tasks/memify/cognify_session.py (1)
cognify_session(9-41)cognee/exceptions/exceptions.py (2)
CogneeValidationError(52-63)CogneeSystemError(38-49)
cognee/tasks/memify/cognify_session.py (2)
cognee/exceptions/exceptions.py (2)
CogneeValidationError(52-63)CogneeSystemError(38-49)cognee/shared/logging_utils.py (3)
get_logger(212-224)info(205-205)debug(209-209)
cognee/modules/chunking/models/DocumentChunk.py (1)
cognee/infrastructure/engine/models/Edge.py (1)
Edge(5-38)
cognee/__init__.py (1)
cognee/modules/run_custom_pipeline/run_custom_pipeline.py (1)
run_custom_pipeline(14-69)
examples/python/conversation_session_persistence_example.py (5)
cognee/api/v1/visualize/visualize.py (1)
visualize_graph(14-27)cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py (1)
persist_sessions_in_knowledge_graph_pipeline(19-55)cognee/modules/search/types/SearchType.py (1)
SearchType(4-19)cognee/modules/users/methods/get_default_user.py (1)
get_default_user(13-36)cognee/shared/logging_utils.py (1)
get_logger(212-224)
cognee/modules/graph/utils/expand_with_nodes_and_edges.py (1)
cognee/infrastructure/engine/models/Edge.py (1)
Edge(5-38)
cognee/tasks/memify/__init__.py (2)
cognee/tasks/memify/cognify_session.py (1)
cognify_session(9-41)cognee/tasks/memify/extract_user_sessions.py (1)
extract_user_sessions(12-73)
cognee/modules/chunking/text_chunker_with_overlap.py (4)
cognee/shared/logging_utils.py (1)
get_logger(212-224)cognee/tasks/chunks/chunk_by_paragraph.py (1)
chunk_by_paragraph(7-96)cognee/modules/chunking/Chunker.py (1)
Chunker(1-12)cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py (1)
get_chunk_data(272-275)
cognee/tests/test_load.py (3)
cognee/modules/search/types/SearchType.py (1)
SearchType(4-19)cognee/shared/logging_utils.py (1)
get_logger(212-224)cognee/api/v1/config/config.py (2)
data_root_directory(36-38)system_root_directory(18-33)
cognee/modules/retrieval/cypher_search_retriever.py (2)
cognee/infrastructure/databases/graph/kuzu/adapter.py (1)
query(210-278)cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)
query(100-128)
cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py (7)
cognee/context_global_variables.py (2)
set_database_global_context_variables(53-113)set_session_user_context_variable(23-24)cognee/exceptions/exceptions.py (1)
CogneeValidationError(52-63)cognee/modules/data/methods/get_authorized_existing_datasets.py (1)
get_authorized_existing_datasets(11-39)cognee/shared/logging_utils.py (2)
get_logger(212-224)info(205-205)cognee/modules/pipelines/tasks/task.py (1)
Task(5-97)cognee/tasks/memify/extract_user_sessions.py (1)
extract_user_sessions(12-73)cognee/tasks/memify/cognify_session.py (1)
cognify_session(9-41)
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py (2)
cognee/tasks/memify/extract_user_sessions.py (1)
extract_user_sessions(12-73)cognee/exceptions/exceptions.py (1)
CogneeSystemError(38-49)
cognee/tests/unit/infrastructure/databases/test_index_graph_edges.py (1)
cognee/tasks/storage/index_graph_edges.py (1)
index_graph_edges(42-77)
cognee/tests/test_conversation_history.py (3)
cognee/infrastructure/databases/graph/get_graph_engine.py (1)
get_graph_engine(10-24)cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py (1)
persist_sessions_in_knowledge_graph_pipeline(19-55)cognee/infrastructure/databases/vector/get_vector_engine.py (1)
get_vector_engine(5-7)
cognee/modules/graph/utils/resolve_edges_to_text.py (2)
cognee/infrastructure/engine/models/Edge.py (1)
Edge(5-38)cognee/modules/retrieval/graph_completion_retriever.py (1)
resolve_edges_to_text(60-74)
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py (1)
cognee/infrastructure/databases/graph/neptune_driver/adapter.py (1)
query(165-197)
cognee/modules/users/methods/get_authenticated_user.py (2)
cognee/context_global_variables.py (1)
backend_access_control_enabled(36-50)cognee/shared/logging_utils.py (1)
get_logger(212-224)
cognee/tests/unit/modules/retrieval/structured_output_test.py (9)
cognee/infrastructure/engine/models/DataPoint.py (1)
DataPoint(20-220)cognee/modules/data/processing/document_types/TextDocument.py (1)
TextDocument(6-22)cognee/modules/retrieval/graph_completion_cot_retriever.py (1)
GraphCompletionCotRetriever(39-235)cognee/modules/retrieval/graph_completion_retriever.py (1)
GraphCompletionRetriever(28-284)cognee/modules/retrieval/graph_completion_context_extension_retriever.py (1)
GraphCompletionContextExtensionRetriever(17-174)cognee/modules/retrieval/EntityCompletionRetriever.py (1)
EntityCompletionRetriever(20-165)cognee/modules/retrieval/temporal_retriever.py (1)
TemporalRetriever(26-214)cognee/modules/retrieval/completion_retriever.py (1)
CompletionRetriever(20-147)cognee/api/v1/config/config.py (2)
system_root_directory(18-33)data_root_directory(36-38)
cognee/tests/unit/modules/pipelines/test_data_item_label.py (2)
cognee/modules/pipelines/models/DataItem.py (1)
DataItem(6-11)cognee/modules/pipelines/models/DataItemStatus.py (1)
DataItemStatus(4-5)
cognee/modules/retrieval/graph_completion_cot_retriever.py (3)
cognee/modules/retrieval/utils/completion.py (1)
generate_completion(6-28)cognee/modules/retrieval/completion_retriever.py (1)
get_completion(77-147)cognee/modules/retrieval/graph_completion_retriever.py (1)
get_completion(144-218)
🪛 Pylint (4.0.2)
cognee/modules/run_custom_pipeline/run_custom_pipeline.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 14-14: Too many arguments (9/5)
(R0913)
[refactor] 14-14: Too many positional arguments (9/5)
(R0917)
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 239-239: Too many local variables (18/15)
(R0914)
[refactor] 243-243: Too few public methods (0/2)
(R0903)
cognee/modules/pipelines/models/DataItem.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/search/methods/search.py
[refactor] 159-202: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
cognee/context_global_variables.py
[refactor] 38-49: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
cognee/tasks/storage/index_graph_edges.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tests/unit/modules/chunking/test_text_chunker.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 210-210: Too many local variables (17/15)
(R0914)
cognee/tests/unit/infrastructure/databases/test_index_data_points.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 7-7: Too few public methods (0/2)
(R0903)
cognee/tasks/memify/extract_user_sessions.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
examples/python/run_custom_pipeline_example.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tests/unit/modules/memify_tasks/test_cognify_session.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tasks/memify/cognify_session.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
examples/python/conversation_session_persistence_example.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/infrastructure/engine/models/Edge.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/retrieval/base_graph_retriever.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/chunking/text_chunker_with_overlap.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 12-12: Too many arguments (6/5)
(R0913)
[refactor] 12-12: Too many positional arguments (6/5)
(R0917)
[error] 72-72: Instance of 'TextChunkerWithOverlap' has no 'chunk_index' member
(E1101)
[error] 76-76: Instance of 'TextChunkerWithOverlap' has no 'chunk_index' member
(E1101)
[error] 109-109: Instance of 'TextChunkerWithOverlap' has no 'chunk_index' member
(E1101)
[refactor] 11-11: Too few public methods (1/2)
(R0903)
cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tests/unit/modules/memify_tasks/test_extract_user_sessions.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/graph/utils/resolve_edges_to_text.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/retrieval/utils/completion.py
[refactor] 6-6: Too many arguments (7/5)
(R0913)
[refactor] 6-6: Too many positional arguments (7/5)
(R0917)
cognee/tests/unit/modules/retrieval/structured_output_test.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
[refactor] 144-144: Too few public methods (0/2)
(R0903)
[refactor] 147-147: Too few public methods (0/2)
(R0903)
[refactor] 128-128: Too few public methods (1/2)
(R0903)
cognee/tests/unit/modules/pipelines/test_data_item_label.py
[error] 1-1: Unrecognized option found: suggestion-mode
(E0015)
cognee/modules/retrieval/graph_completion_cot_retriever.py
[refactor] 167-167: Too many arguments (6/5)
(R0913)
[refactor] 167-167: Too many positional arguments (6/5)
(R0917)
🪛 Ruff (0.14.4)
cognee/modules/run_custom_pipeline/run_custom_pipeline.py
15-15: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py
32-32: Unused function argument: text
(ARG001)
cognee/context_global_variables.py
46-48: Avoid specifying long messages outside the exception class
(TRY003)
cognee/tests/unit/modules/chunking/test_text_chunker.py
163-163: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
cognee/tests/unit/infrastructure/databases/test_index_data_points.py
9-9: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
cognee/tasks/memify/extract_user_sessions.py
38-38: Abstract raise to an inner function
(TRY301)
44-47: Abstract raise to an inner function
(TRY301)
61-61: Do not catch blind exception: Exception
(BLE001)
62-62: Use explicit conversion flag
Replace with conversion flag
(RUF010)
71-71: Do not catch blind exception: Exception
(BLE001)
72-72: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
72-72: Use explicit conversion flag
Replace with conversion flag
(RUF010)
73-73: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
73-73: Use explicit conversion flag
Replace with conversion flag
(RUF010)
cognee/tasks/memify/cognify_session.py
28-28: Abstract raise to an inner function
(TRY301)
39-39: Do not catch blind exception: Exception
(BLE001)
40-40: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
40-40: Use explicit conversion flag
Replace with conversion flag
(RUF010)
41-41: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
41-41: Use explicit conversion flag
Replace with conversion flag
(RUF010)
cognee/modules/chunking/text_chunker_with_overlap.py
18-18: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
72-72: Use explicit conversion flag
Replace with conversion flag
(RUF010)
82-82: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
83-83: Use raise without specifying exception name
Remove exception name
(TRY201)
cognee/memify_pipelines/persist_sessions_in_knowledge_graph.py
32-32: Use explicit conversion flag
Replace with conversion flag
(RUF010)
cognee/tests/unit/api/test_conditional_authentication_endpoints.py
244-244: Unused method argument: client
(ARG002)
cognee/modules/graph/utils/resolve_edges_to_text.py
10-10: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
| async def is_empty(self) -> bool: | ||
| query = """ | ||
| MATCH (n) | ||
| RETURN true | ||
| LIMIT 1; | ||
| """ | ||
| query_result = await self._client.query(query) | ||
| return len(query_result) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Remove await from synchronous method call.
Line 425 attempts to await a synchronous method (self._client.query()), which will raise a TypeError at runtime. All other calls to self._client.query() in this file (lines 202, 227, 369, 416) are synchronous and do not use await.
Apply this diff to fix the issue:
- query_result = await self._client.query(query)
+ query_result = self._client.query(query)Optional: Consider optimizing the query.
The current query could be simplified for better readability:
async def is_empty(self) -> bool:
query = "MATCH (n) RETURN count(n) AS node_count"
query_result = self._client.query(query)
return query_result[0].get("node_count", 0) == 0🤖 Prompt for AI Agents
In
cognee/infrastructure/databases/hybrid/neptune_analytics/NeptuneAnalyticsAdapter.py
around lines 419 to 426, remove the incorrect await on the synchronous
self._client.query() call (it raises TypeError at runtime); call
self._client.query(query) synchronously and return the boolean based on its
result length, and optionally replace the MATCH...RETURN true LIMIT 1 query with
a count-based query ("MATCH (n) RETURN count(n) AS node_count") and check
node_count == 0 to determine emptiness for clearer semantics and slightly better
performance.
| get_text: callable, | ||
| max_chunk_size: int, | ||
| chunk_overlap_ratio: float = 0.0, | ||
| get_chunk_data: callable = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add explicit Optional type annotation.
The get_chunk_data parameter should have an explicit Optional type annotation instead of implicit None default, per PEP 484 and ruff rules.
Apply this diff:
document,
get_text: callable,
max_chunk_size: int,
chunk_overlap_ratio: float = 0.0,
- get_chunk_data: callable = None,
+ get_chunk_data: Optional[callable] = None,
):Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.4)
18-18: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
🤖 Prompt for AI Agents
In cognee/modules/chunking/text_chunker_with_overlap.py around line 18, the
parameter get_chunk_data is currently set to a default of None without an
explicit Optional type; update its annotation to use typing.Optional (e.g.,
get_chunk_data: Optional[Callable[..., Any]] = None or Optional[Callable] =
None) and ensure Optional, Callable and Any are imported from typing at the top
of the file (add missing imports if necessary).
|
|
||
| def _create_chunk_from_accumulation(self): | ||
| """Create a DocumentChunk from current accumulated chunk_data.""" | ||
| chunk_text = " ".join(chunk["text"] for chunk in self._accumulated_chunk_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify text joining behavior.
Joining accumulated chunks with a single space " ".join(...) could lead to double spaces if chunks already have trailing or leading whitespace. Consider whether chunks should be stripped or joined differently based on the expected text reconstruction behavior.
Based on learnings, changes to chunking implementation may require updating test ground truth values. Run this script to check existing test expectations:
🏁 Script executed:
#!/bin/bash
# Search for tests that verify chunk text output to understand expected spacing behavior
rg -n -A 5 "chunk.*text" cognee/tests/unit/modules/chunking/ --type py | grep -E "(assert|expected|\.text)"Length of output: 7365
I need to examine the implementation to understand how chunks are accumulated and whether stripping is necessary at line 87.
🏁 Script executed:
#!/bin/bash
# View the text_chunker_with_overlap.py implementation around line 87
cat cognee/modules/chunking/text_chunker_with_overlap.py | sed -n '60,100p'Length of output: 1592
🏁 Script executed:
#!/bin/bash
# View more context to understand how _accumulated_chunk_data is populated
cat cognee/modules/chunking/text_chunker_with_overlap.py | head -150 | tail -100Length of output: 3897
🏁 Script executed:
#!/bin/bash
# Search for where chunks are added to _accumulated_chunk_data
rg -n "_accumulated_chunk_data" cognee/modules/chunking/text_chunker_with_overlap.py -B 3 -A 3Length of output: 2385
🏁 Script executed:
#!/bin/bash
# Check if there's stripping being done elsewhere in the chunk text before accumulation
rg -n "\.strip\(\)" cognee/modules/chunking/text_chunker_with_overlap.pyLength of output: 0
🏁 Script executed:
#!/bin/bash
# Look at actual tests to understand how chunk text looks like in practice
cat cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py | sed -n '70,90p'Length of output: 829
🏁 Script executed:
#!/bin/bash
# Check what the chunk dictionary contains - look for chunk["text"] pattern
rg -n 'chunk\["text"\]' cognee/modules/chunking/ -B 2 -A 2Length of output: 1519
🏁 Script executed:
#!/bin/bash
# Find where the chunk dictionary gets created
rg -n '"text"' cognee/modules/chunking/text_chunker_with_overlap.py -B 2 -A 2Length of output: 730
🏁 Script executed:
#!/bin/bash
# Inspect the accumulate_chunk_data method
rg -n "def.*accum" cognee/modules/chunking/text_chunker_with_overlap.py -A 20Length of output: 2323
🏁 Script executed:
#!/bin/bash
# Look for the place where text is extracted for chunk accumulation
rg -n "def process" cognee/modules/chunking/text_chunker_with_overlap.py -A 50 | head -60Length of output: 0
🏁 Script executed:
#!/bin/bash
# Check actual chunk content by looking at a simple test case
cat cognee/tests/unit/modules/chunking/test_text_chunker.py | sed -n '60,100p'Length of output: 1606
🏁 Script executed:
#!/bin/bash
# Search for where chunk text is actually set when chunks are created
rg -n "chunk_text\s*=" cognee/modules/chunking/text_chunker_with_overlap.py -B 2 -A 2Length of output: 286
🏁 Script executed:
#!/bin/bash
# Look at all chunk creation in the file
rg -n "chunk_text" cognee/modules/chunking/text_chunker_with_overlap.pyLength of output: 121
🏁 Script executed:
#!/bin/bash
# Full context of _create_chunk_from_accumulation and surrounding methods
cat cognee/modules/chunking/text_chunker_with_overlap.py | sed -n '50,120p'Length of output: 2664
🏁 Script executed:
#!/bin/bash
# Look at the entire file to understand structure
wc -l cognee/modules/chunking/text_chunker_with_overlap.pyLength of output: 56
🏁 Script executed:
#!/bin/bash
# Get the full file
cat cognee/modules/chunking/text_chunker_with_overlap.pyLength of output: 4663
🏁 Script executed:
#!/bin/bash
# Search for what gets passed to _accumulated_chunk_data.append
rg -n "_accumulated_chunk_data\.append" cognee/modules/chunking/text_chunker_with_overlap.py -B 5 -A 2Length of output: 405
🏁 Script executed:
#!/bin/bash
# Search for strip calls more broadly in the chunking logic
rg -n "strip" cognee/modules/chunking/ -type pyLength of output: 27
🏁 Script executed:
#!/bin/bash
# Look at split_text or similar methods to understand what gets accumulated
rg -n "def split_text|def process|def chunk" cognee/modules/chunking/text_chunker_with_overlap.py -A 30 | head -80Length of output: 0
🏁 Script executed:
#!/bin/bash
# Find where text is being processed before being added
rg -n "for.*in.*\.split\(" cognee/modules/chunking/text_chunker_with_overlap.py -B 2 -A 5Length of output: 0
🏁 Script executed:
#!/bin/bash
# Simpler approach - see what _accumulated_chunk_data items look like
rg -n "_accumulated_chunk_data" cognee/modules/chunking/text_chunker_with_overlap.pyLength of output: 550
🏁 Script executed:
#!/bin/bash
# Get original review comment file to see full context
cat cognee/modules/chunking/text_chunker_with_overlap.py | sed -n '75,95p'Length of output: 784
🏁 Script executed:
#!/bin/bash
# Find chunk_by_paragraph implementation
fd -name "*.py" -path "*/cognee/tasks/chunks*" | head -20Length of output: 230
🏁 Script executed:
#!/bin/bash
# Search for chunk_by_paragraph function
rg -n "def chunk_by_paragraph" cognee/Length of output: 67
🏁 Script executed:
#!/bin/bash
# Look for where chunk_by_paragraph is defined and what it returns
rg -n "chunk_by_paragraph" cognee/ -B 2 -A 10 | head -60Length of output: 5369
🏁 Script executed:
#!/bin/bash
# Check what the chunk dictionary structure is - look for batch_paragraphs usage
rg -n "batch_paragraphs" cognee/ -B 3 -A 10 | head -80Length of output: 7890
🏁 Script executed:
#!/bin/bash
# Look at TextChunker.py to see the pattern there (similar joining at line 40)
cat cognee/modules/chunking/TextChunker.py | head -80Length of output: 3424
🏁 Script executed:
#!/bin/bash
# Check if there's any text processing or stripping in chunk_by_paragraph or related functions
rg -n "chunk_by_paragraph\|def.*chunk.*paragraph" cognee/tasks/chunks/ -A 15 | head -100Length of output: 0
🏁 Script executed:
#!/bin/bash
# View chunk_by_paragraph implementation
cat cognee/tasks/chunks/chunk_by_paragraph.pyLength of output: 3414
🏁 Script executed:
#!/bin/bash
# Also check if there are any recent changes or comments about spacing
rg -n "batch_paragraphs|spacing|double.*space" cognee/tasks/chunks/chunk_by_paragraph.py -B 2 -A 3Length of output: 1137
🏁 Script executed:
#!/bin/bash
# Look for text reconstruction tests or exact text matching in TextChunkerWithOverlap tests
rg -n "reconstructed|all_text|join.*chunk.*text|exact" cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py -B 2 -A 2Length of output: 1962
🏁 Script executed:
#!/bin/bash
# Check if there are any tests that do strip() on chunks in text_chunker_with_overlap tests
rg -n "\.strip\(\)" cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py -B 2 -A 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Compare exact pattern in both test files
echo "=== TextChunker test pattern (line 168-170) ===" && sed -n '168,170p' cognee/tests/unit/modules/chunking/test_text_chunker.py
echo ""
echo "=== Searching for similar pattern in TextChunkerWithOverlap tests ===" && rg -n "\.join.*chunk" cognee/tests/unit/modules/chunking/test_text_chunker_with_overlap.py -B 2 -A 2Length of output: 323
🏁 Script executed:
#!/bin/bash
# Check how chunk_by_sentence returns text (with or without spaces)
cat cognee/tasks/chunks/chunk_by_sentence.py | head -60Length of output: 2181
Strip chunks before joining to match TextChunker behavior and ensure correct spacing.
Line 87 joins chunks with a single space without stripping, creating potential double-spaces from sentence boundaries. TextChunker (line 40) explicitly strips each chunk before joining. Update line 87 to:
chunk_text = " ".join(chunk["text"].strip() for chunk in self._accumulated_chunk_data)Additionally, add a test case to verify exact text reconstruction doesn't break with overlap, similar to test_text_chunker.py lines 168-170.
🤖 Prompt for AI Agents
In cognee/modules/chunking/text_chunker_with_overlap.py around line 87, the code
joins chunk["text"] values without stripping, which can produce double spaces at
sentence boundaries; change the join to strip each chunk before joining (i.e.,
use chunk["text"].strip() for each element) so spacing matches TextChunker
behavior, and add a unit test similar to test_text_chunker.py lines 168-170 to
verify exact text reconstruction with overlaps.
| result = await graph_engine.query(query) | ||
| result = jsonable_encoder(await graph_engine.query(query)) | ||
| except Exception as e: | ||
| logger.error("Failed to execture cypher search retrieval: %s", str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in error message.
The error message contains a typo: "execture" should be "execute".
Apply this diff:
- logger.error("Failed to execture cypher search retrieval: %s", str(e))
+ logger.error("Failed to execute cypher search retrieval: %s", str(e))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.error("Failed to execture cypher search retrieval: %s", str(e)) | |
| logger.error("Failed to execute cypher search retrieval: %s", str(e)) |
🧰 Tools
🪛 Ruff (0.14.4)
57-57: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
In cognee/modules/retrieval/cypher_search_retriever.py around line 57, the
logger error message has a typo ("execture"); update the message text to
"execute" so it reads something like "Failed to execute cypher search retrieval:
%s", preserving the existing logger.error call and error interpolation (str(e)).
| context: Optional[List[Edge]] = None, | ||
| session_id: Optional[str] = None, | ||
| max_iter: int = 4, | ||
| max_iter=4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type annotation for max_iter parameter.
The max_iter parameter is missing a type annotation. Based on its default value and usage, it should be int.
Apply this diff:
- max_iter=4,
+ max_iter: int = 4,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| max_iter=4, | |
| max_iter: int = 4, |
🤖 Prompt for AI Agents
In cognee/modules/retrieval/graph_completion_cot_retriever.py around line 172,
the parameter declaration "max_iter=4," lacks a type annotation; update the
function/method signature to annotate it as an integer (e.g., change to
"max_iter: int = 4,") so the parameter is explicitly typed as int.
| async def main(): | ||
| data_directory_path = os.path.join(pathlib.Path(__file__).parent, ".data_storage/test_load") | ||
| cognee.config.data_root_directory(data_directory_path) | ||
|
|
||
| cognee_directory_path = os.path.join(pathlib.Path(__file__).parent, ".cognee_system/test_load") | ||
| cognee.config.system_root_directory(cognee_directory_path) | ||
|
|
||
| num_of_pdfs = 10 | ||
| num_of_reps = 5 | ||
| upper_boundary_minutes = 10 | ||
| average_minutes = 8 | ||
|
|
||
| recorded_times = [] | ||
| for _ in range(num_of_reps): | ||
| await cognee.prune.prune_data() | ||
| await cognee.prune.prune_system(metadata=True) | ||
|
|
||
| s3_input = "s3://cognee-test-load-s3-bucket" | ||
| await cognee.add(s3_input) | ||
|
|
||
| recorded_times.append(await process_and_search(num_of_pdfs)) | ||
|
|
||
| average_recorded_time = sum(recorded_times) / len(recorded_times) | ||
|
|
||
| assert average_recorded_time <= average_minutes * 60 | ||
|
|
||
| assert all(rec_time <= upper_boundary_minutes * 60 for rec_time in recorded_times) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert this load check into an actual pytest test and drop the live S3 dependency.
Pytest will never execute main() because it does not match the test_* naming convention, so the assertions here never run and the module contributes no coverage. Please move this logic into an async test_* function (decorated with @pytest.mark.asyncio) so the checks execute during CI. In addition, the current implementation calls cognee.add("s3://cognee-test-load-s3-bucket"); our test environment has no AWS credentials, so this will raise on the first run. Replace the live S3 call with a fixture or mock dataset that works offline.
🤖 Prompt for AI Agents
In cognee/tests/test_load.py around lines 32 to 59, the test logic is defined
inside an async main() that pytest will never run and it calls a live S3 path;
convert this into an async pytest test function and remove the live AWS
dependency. Replace main() with an async def test_load_performance(...)
decorated with @pytest.mark.asyncio, import pytest, and either use a local test
fixture that prepares a mock dataset in the configured data directories or
monkeypatch cognee.add to accept a local path instead of "s3://..."; keep the
prune/setup, run process_and_search the same number of reps, compute
average_recorded_time and assert the time thresholds as before, and ensure any
setup/teardown of .data_storage/.cognee_system is handled by fixtures so the
test runs offline and is executed by CI.
| assert len(chunks) == 20, ( | ||
| "Should batch paragraphs (2 per chunk: 12 words × 2 tokens = 24, 24 + 1 joiner + 24 = 49)" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the Unicode multiplication sign so Ruff passes.
Line [163] uses the × character in the assertion message. Ruff (RUF001) flags this as ambiguous Unicode, and the lint step will fail. Switch to plain ASCII (x) or spell out the multiplication instead.
- assert len(chunks) == 20, (
- "Should batch paragraphs (2 per chunk: 12 words × 2 tokens = 24, 24 + 1 joiner + 24 = 49)"
- )
+ assert len(chunks) == 20, (
+ "Should batch paragraphs (2 per chunk: 12 words x 2 tokens = 24, 24 + 1 joiner + 24 = 49)"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert len(chunks) == 20, ( | |
| "Should batch paragraphs (2 per chunk: 12 words × 2 tokens = 24, 24 + 1 joiner + 24 = 49)" | |
| ) | |
| assert len(chunks) == 20, ( | |
| "Should batch paragraphs (2 per chunk: 12 words x 2 tokens = 24, 24 + 1 joiner + 24 = 49)" | |
| ) |
🧰 Tools
🪛 Ruff (0.14.4)
163-163: String contains ambiguous × (MULTIPLICATION SIGN). Did you mean x (LATIN SMALL LETTER X)?
(RUF001)
🤖 Prompt for AI Agents
In cognee/tests/unit/modules/chunking/test_text_chunker.py around lines 162 to
164, the assertion message uses the Unicode multiplication sign "×" which Ruff
flags as ambiguous; replace that character with plain ASCII "x" or spell out
"times" (e.g., "2 x tokens" or "two times") so the lint rule RUF001 no longer
fails and update the assertion message accordingly.
| def _assert_string_answer(answer: list[str]): | ||
| assert isinstance(answer, list), f"Expected str, got {type(answer).__name__}" | ||
| assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings" | ||
| assert all(item.strip() for item in answer), "Items should not be empty" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix assertion message.
Line 32 checks that answer is a list, but the error message says "Expected str". This could be confusing if the assertion fails.
Apply this diff:
def _assert_string_answer(answer: list[str]):
- assert isinstance(answer, list), f"Expected str, got {type(answer).__name__}"
+ assert isinstance(answer, list), f"Expected list, got {type(answer).__name__}"
assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings"
assert all(item.strip() for item in answer), "Items should not be empty"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _assert_string_answer(answer: list[str]): | |
| assert isinstance(answer, list), f"Expected str, got {type(answer).__name__}" | |
| assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings" | |
| assert all(item.strip() for item in answer), "Items should not be empty" | |
| def _assert_string_answer(answer: list[str]): | |
| assert isinstance(answer, list), f"Expected list, got {type(answer).__name__}" | |
| assert all(isinstance(item, str) and item.strip() for item in answer), "Items should be strings" | |
| assert all(item.strip() for item in answer), "Items should not be empty" |
🤖 Prompt for AI Agents
In cognee/tests/unit/modules/retrieval/structured_output_test.py around lines 31
to 34, the first assertion message incorrectly says "Expected str" while
checking for a list; update that assertion to report the correct expected type
(e.g., "Expected list, got {type(answer).__name__}") so failures accurately
describe the mismatch.
| structured_answer = await retriever.get_completion( | ||
| "When did Steve start working at Figma??", response_model=TestAnswer | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo: double question mark.
Line 81 has two question marks at the end of the query string.
Apply this diff:
structured_answer = await retriever.get_completion(
- "When did Steve start working at Figma??", response_model=TestAnswer
+ "When did Steve start working at Figma?", response_model=TestAnswer
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| structured_answer = await retriever.get_completion( | |
| "When did Steve start working at Figma??", response_model=TestAnswer | |
| ) | |
| structured_answer = await retriever.get_completion( | |
| "When did Steve start working at Figma?", response_model=TestAnswer | |
| ) |
🤖 Prompt for AI Agents
In cognee/tests/unit/modules/retrieval/structured_output_test.py around lines 80
to 82, the test query string contains a typo with two question marks at the end
("When did Steve start working at Figma??"); change it to a single question mark
so the call becomes "When did Steve start working at Figma?" and run tests to
ensure no formatting-sensitive assertions fail.
|
|
||
| print("Coding rules created by memify:") | ||
| for coding_rule in coding_rules: | ||
| for coding_rule in coding_rules[0]["search_result"][0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Add defensive checks for the nested structure access.
The deeply nested access coding_rules[0]["search_result"][0] will raise an IndexError or KeyError if the search returns no results or a different structure. Examples should demonstrate robust handling to prevent crashes when users run them.
Consider adding validation before accessing the nested structure:
- for coding_rule in coding_rules[0]["search_result"][0]:
+ if coding_rules and len(coding_rules) > 0 and "search_result" in coding_rules[0]:
+ search_results = coding_rules[0]["search_result"]
+ if search_results and len(search_results) > 0:
+ for coding_rule in search_results[0]:
+ print("- " + coding_rule)
+ else:
+ print("No coding rules found.")
+ return
+
+ for coding_rule in coding_rules[0]["search_result"][0]:
print("- " + coding_rule)Alternatively, verify that the cognee.search() API guarantees this nested structure in all cases by running the following script to check the API implementation and test coverage:
🏁 Script executed:
#!/bin/bash
# Verify the structure returned by cognee.search() for CODING_RULES type
# Find the search implementation
ast-grep --pattern $'async def search($$$):
$$$'
# Check if tests validate the nested structure for empty results
rg -nP --type=py -C3 'SearchType\.CODING_RULES|search.*coding.*rules' -g '!**/memify_coding_agent_example.py'Length of output: 34798
---
🏁 Script executed:
```shell
#!/bin/bash
# Find CodingRulesRetriever class and get_existing_rules method
rg -nP --type=py -C10 'class CodingRulesRetriever' -g '!**/test/**'
Length of output: 1908
🏁 Script executed:
#!/bin/bash
# Get full CodingRulesRetriever implementation
cat cognee/modules/retrieval/coding_rules_retriever.pyLength of output: 1133
🏁 Script executed:
#!/bin/bash
# Find the imported get_existing_rules function
rg -nP --type=py -C15 'async def get_existing_rules|def get_existing_rules' cognee/tasks/codingagents/coding_rule_associations.py 2>/dev/null || rg -nP --type=py -C15 'get_existing_rules' cognee/tasks/codingagents/ -g '*.py' | head -80Length of output: 1011
🏁 Script executed:
#!/bin/bash
# Find the cognee.search() function that wraps these retrievers
rg -nP --type=py -C20 'async def search\(' cognee/api/v1/search/ -g '*.py' | head -100Length of output: 7109
🏁 Script executed:
#!/bin/bash
# Find the search function in cognee.modules.search.methods
rg -nP --type=py -C30 'async def search\(' cognee/modules/search/methods/ -g '*.py' | head -150Length of output: 4601
🏁 Script executed:
#!/bin/bash
# Also check what structure is returned for CODING_RULES type
rg -nP --type=py -B5 -A20 'SearchType\.CODING_RULES' cognee/modules/search/methods/ -g '*.py'Length of output: 2424
🏁 Script executed:
#!/bin/bash
# Get the full search function implementation
cat cognee/modules/search/methods/search.pyLength of output: 15006
🏁 Script executed:
#!/bin/bash
# Find prepare_search_result implementation
cat cognee/modules/search/utils/prepare_search_result.pyLength of output: 2333
Add defensive checks for the nested structure access.
The deeply nested access coding_rules[0]["search_result"][0] on line 92 will raise an IndexError or iterate over unintended data if the search returns no results or a single result. Examples should demonstrate robust handling to prevent crashes when users run them.
The risks are:
- Empty results cause
IndexErroron[0]access - Single rule result gets wrapped as a string, causing character-by-character iteration
- No validation at any nesting level
Consider adding defensive checks:
if coding_rules and len(coding_rules) > 0 and "search_result" in coding_rules[0]:
rules = coding_rules[0]["search_result"]
if rules and len(rules) > 0:
for coding_rule in rules[0]:
print("- " + coding_rule)Alternatively, verify the cognee.search() API contract guarantees this nested dict structure in all cases (including empty results) before relying on it in the example.
🤖 Prompt for AI Agents
In examples/python/memify_coding_agent_example.py around line 92, the loop uses
coding_rules[0]["search_result"][0] without validating the nested structure
which can raise IndexError or iterate characters if a single string is returned;
add defensive checks to ensure coding_rules is non-empty, that coding_rules[0]
contains "search_result" and that its value is a non-empty list (and that the
first element is iterable of rules), then iterate over the validated rules;
alternatively normalize the API output into a list before looping (e.g., coerce
single-item/string results to a list) and log or handle the empty-case
gracefully so the example never crashes.
|
Hi @mohitk-patwari . Thank you for the contribution. Before we start the deep review, could you please resolve the code rabbit issues and decide on these 2 large JSON files you've added. Do we need them in our codebase? |
| from cognee.modules.pipelines.models.DataItemStatus import DataItemStatus | ||
|
|
||
| @dataclass | ||
| class DataItem: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cognee.add function needs to be able to take this as input for data and add name/label information from this class to the relational database for the Data table row as well
763a05d to
4f58cec
Compare
|
I've changed the PR to draft state, return it to ready for review state when Data in this Dataclass can be ingested with Cognee.add and label information is in the SQL database. We'll review it again at that time |
…, and API integration — verified ORM operations, Alembic upgrades, and end-to-end consistency via verify_db.py
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 9573981 | Triggered | Generic Password | d724a58 | .github/workflows/e2e_tests.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
| status=DataItemStatus.DATA_ITEM_PROCESSING_COMPLETED, | ||
| label="Important" | ||
| ) | ||
| assert item.label == "Important" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for a more comprehensive test, add this data with cognee.add and then check what label it has in the relational database.
| preferred_loaders: Optional[List[Union[str, dict[str, dict[str, Any]]]]] = None, | ||
| incremental_loading: bool = True, | ||
| data_per_batch: Optional[int] = 20, | ||
| label: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add DataItem items, then we can read directly from them, we don't need additional label here.
|
Closed due to inactivity |
Description
In order to facilitate flexible tagging and classification of dataset entries within the Cognee AI pipeline, this PR adds an optional label field to the DataItem dataclass.
Type of Change
Screenshots/Videos (if applicable)
N/A - this change affects only the backend logics only
Pre-submission Checklist
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin.