-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: implement unit tests and extensive checks around the get_graph_from_model [COG-754] #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement unit tests and extensive checks around the get_graph_from_model [COG-754] #491
Conversation
…nsive-checks-around-the
…nsive-checks-around-the
WalkthroughThis pull request introduces comprehensive unit tests for the Changes
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1)
34-56: Consider adding more test cases for better coverage.While the current test case is good, consider adding:
- Edge cases (e.g., cyclic references)
- Negative test cases (e.g., invalid relationships)
- Complex structures with multiple levels
cognee/infrastructure/databases/graph/neo4j_driver/adapter.py (1)
208-221: LGTM! Good use of APOC merge procedure.The switch to
apoc.merge.relationshipis a good improvement as it prevents duplicate relationships. The implementation is efficient with batch processing using UNWIND.Consider adding more specific error handling for common Neo4j relationship errors. Here's a suggested improvement:
try: results = await self.query(query, dict(edges=edges)) return results except Neo4jError as error: + if 'ConstraintValidationFailed' in str(error): + logger.error("Constraint validation failed while adding edges: %s", error) + raise ValueError("Invalid edge data: Constraint validation failed") from error + if 'NodeNotFound' in str(error): + logger.error("Source or target node not found while adding edges: %s", error) + raise ValueError("Invalid edge data: Node not found") from error logger.error("Neo4j query error: %s", error, exc_info=True) raise error
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
cognee/api/v1/cognify/cognify_v2.py(1 hunks)cognee/infrastructure/databases/graph/neo4j_driver/adapter.py(1 hunks)cognee/modules/graph/utils/get_graph_from_model.py(1 hunks)cognee/tasks/storage/add_data_points.py(1 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py(1 hunks)
🔇 Additional comments (4)
cognee/tasks/storage/add_data_points.py (1)
8-8: LGTM! Function signature simplified.The removal of the
only_rootparameter simplifies the interface while maintaining the core functionality.cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1)
8-31: LGTM! Well-structured test classes.The class hierarchy and metadata definitions are clear and follow best practices.
cognee/modules/graph/utils/get_graph_from_model.py (1)
100-100: LGTM! Simplified node processing logic.The removal of the
only_rootcheck simplifies the traversal logic while maintaining correct behavior.cognee/api/v1/cognify/cognify_v2.py (1)
167-167: LGTM! Task configuration simplified.The removal of the
only_rootparameter from the Task instantiation maintains consistency with the refactored functions.
…nsive-checks-around-the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (3)
2-2: Remove unused import.The
Optionaltype hint is imported but not used in the code.-from typing import List, Optional +from typing import List🧰 Tools
🪛 GitHub Actions: ruff format
[warning] File requires formatting. Code does not conform to Ruff formatting standards.
13-17: Consider forward reference for circular import.The
containsfield creates a circular import with theEntityclass. While using string literal type annotation works, consider usingfrom __future__ import annotationsfor a more modern approach to handle forward references.+from __future__ import annotations class DocumentChunk(DataPoint): part_of: Document text: str - contains: List["Entity"] = None + contains: List[Entity] = None _metadata = {"index_fields": ["text"], "type": "DocumentChunk"}🧰 Tools
🪛 GitHub Actions: ruff format
[warning] File requires formatting. Code does not conform to Ruff formatting standards.
34-56: Enhance test coverage and assertions.While the test covers basic functionality, consider these improvements:
- Add cleanup using pytest fixtures
- Add more assertions to verify node/edge properties
- Add edge cases and error scenarios
Example enhancement:
@pytest.fixture async def setup_test_data(): entitytype = EntityType(name="TestType") entity = Entity(name="TestEntity", is_type=entitytype) yield entitytype, entity # Cleanup if needed @pytest.mark.asyncio async def test_get_graph_from_model_simple_structure(setup_test_data): entitytype, entity = setup_test_data added_nodes = {} added_edges = {} visited_properties = {} nodes, edges = await get_graph_from_model( entity, added_nodes, added_edges, visited_properties ) # Existing assertions assert len(nodes) == 2 assert len(edges) == 1 edge_key = str(entity.id) + str(entitytype.id) + "is_type" assert edge_key in added_edges # Additional assertions assert entity.id in added_nodes assert entitytype.id in added_nodes assert added_edges[edge_key]["type"] == "is_type" assert added_edges[edge_key]["source"] == entity.id assert added_edges[edge_key]["target"] == entitytype.id🧰 Tools
🪛 GitHub Actions: ruff format
[warning] File requires formatting. Code does not conform to Ruff formatting standards.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: ruff format
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py
[warning] File requires formatting. Code does not conform to Ruff formatting standards.
⏰ Context from checks skipped due to timeout of 90000ms (21)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: profiler
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_multimedia_example_test / test
- GitHub Check: run_dynamic_steps_example_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: docker-compose-test
- GitHub Check: profiler
- GitHub Check: Publish Cognee Docker image
🔇 Additional comments (1)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (1)
31-31: Ensure consistent model rebuilding.
model_rebuild()is only called forDocumentChunk. Consider if other models also need rebuilding for consistency.✅ Verification successful
Model rebuilding implementation is consistent with codebase patterns
The
model_rebuild()usage forDocumentChunkfollows the established pattern across the codebase, where models are rebuilt in their respective contexts including test setups, model definitions, and utility functions.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for other model_rebuild() calls in the codebase rg "model_rebuild\(\)" -A 2 -B 2Length of output: 2599
🧰 Tools
🪛 GitHub Actions: ruff format
[warning] File requires formatting. Code does not conform to Ruff formatting standards.
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (2)
31-31: Consider model rebuild for all classes.Only
DocumentChunkis being rebuilt. For consistency, consider rebuilding all model classes that have circular references.Apply this diff to rebuild all models:
DocumentChunk.model_rebuild() +Document.model_rebuild() +EntityType.model_rebuild() +Entity.model_rebuild()
34-55: Consider using test fixtures for common test data.The test creates entity types and entities that are reused across multiple tests. Consider using pytest fixtures to reduce code duplication and improve maintainability.
Example implementation:
@pytest.fixture def entity_type(): return EntityType(name="TestType") @pytest.fixture def entity(entity_type): return Entity(name="TestEntity", is_type=entity_type)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: run_notebook_test / test
- GitHub Check: windows-latest
🔇 Additional comments (2)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py (2)
81-100: Add negative test case for duplicate handling.The test verifies duplicate entity handling but doesn't test the negative case where entities have the same name but different IDs.
Add a test case for entities with same name but different IDs:
entity_type = EntityType(name="Animal") entity1 = Entity(name="Cat", is_type=entity_type) entity2 = Entity(name="Cat", is_type=entity_type) # Same name, different ID doc_chunk.contains.extend([entity1, entity2])✅ Verification successful
The suggested test case is valid and necessary
The review comment correctly identifies a gap in test coverage. The current test verifies handling of duplicate references to the same entity instance, while the suggested test would verify that entities with the same name but different IDs are correctly treated as distinct nodes. This aligns with the implementation's ID-based deduplication logic in
get_graph_from_model.🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash # Find Entity class definition ast-grep --pattern 'class Entity' # Find get_graph_from_model implementation ast-grep --pattern 'async def get_graph_from_model' # Search for other test files related to graph fd "test.*graph.*\.py$"Length of output: 184
Script:
#!/bin/bash # Search for Entity class with more context rg -l "class Entity" # Look for get_graph_from_model implementation rg "def get_graph_from_model" -A 5 # Check imports in the test file cat cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_tests.py | grep "import" # Search for model definitions fd "models?.py$"Length of output: 2481
Script:
#!/bin/bash # Check Entity class implementation cat cognee/modules/engine/models/Entity.py # Check get_graph_from_model implementation cat cognee/modules/graph/utils/get_graph_from_model.py # Look for any existing duplicate-related tests rg -l "duplicate" cognee/tests/Length of output: 4522
104-135: Verify node uniqueness across multiple calls.The test combines results from multiple
get_graph_from_modelcalls but doesn't verify that nodes aren't duplicated when processing related chunks.Add assertions to verify node uniqueness:
# Verify no duplicate nodes node_ids = {node.id for node in all_nodes} assert len(node_ids) == len(all_nodes), "Duplicate nodes found" # Verify shared nodes (doc) appear only once doc_nodes = [node for node in all_nodes if node.id == doc.id] assert len(doc_nodes) == 1, "Document node duplicated"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/test_python_3_12.yml (1)
44-47: Consider a more CI-friendly approach to handle pytest-asyncio.While the current implementation works, using
poetry addin CI modifies the project's dependencies. Consider these alternatives:
- Add pytest-asyncio to pyproject.toml directly
- Use
poetry installwith a specific versionApply this diff to make it more CI-friendly:
- - name: Add pytest-asyncio - run: | - poetry add --dev pytest-asyncio + - name: Install pytest-asyncio + run: | + poetry run pip install "pytest-asyncio==0.23.3"Or better yet, add it to pyproject.toml:
[tool.poetry.group.dev.dependencies] pytest-asyncio = "^0.23.3"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/test_python_3_12.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (11)
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_notebook_test / test
- GitHub Check: run_simple_example_test / test
- GitHub Check: test
- GitHub Check: test
- GitHub Check: windows-latest
- GitHub Check: test
- GitHub Check: docker-compose-test
- GitHub Check: Publish Cognee Docker image
This reverts commit 41007aa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_test.py (6)
8-32: LGTM! Consider adding type hints for _metadata.The data class hierarchy is well-structured with clear relationships. Consider adding type hints for the
_metadataclass variable to improve code maintainability:- _metadata = {"index_fields": [], "type": "Document"} + _metadata: dict[str, list[str] | str] = {"index_fields": [], "type": "Document"}
34-55: Enhance test coverage with node property assertions.The test verifies basic structure but could be more thorough. Consider adding assertions for node properties to ensure the graph maintains the correct entity information:
assert len(nodes) == 2, f"Expected 2 nodes, got {len(nodes)}" assert len(edges) == 1, f"Expected 1 edges, got {len(edges)}" edge_key = str(entity.id) + str(entitytype.id) + "is_type" assert edge_key in added_edges, f"Edge {edge_key} not found" + + # Verify node properties + entity_node = next(n for n in nodes if n["id"] == str(entity.id)) + assert entity_node["name"] == "TestEntity" + + type_node = next(n for n in nodes if n["id"] == str(entitytype.id)) + assert type_node["name"] == "TestType"
57-78: Add relationship validation between nodes.While the test verifies node and edge counts, it should also validate the relationships between nodes to ensure correct graph structure:
assert len(nodes) == 5, f"Expected 5 nodes, got {len(nodes)}" assert len(edges) == 5, f"Expected 5 edges, got {len(edges)}" + + # Verify relationships + doc_chunk_edges = [e for e in edges if str(doc_chunk.id) in e["source"]] + assert len(doc_chunk_edges) == 3 # part_of + 2 contains + + entity_edges = [e for e in edges if e["type"] == "is_type"] + assert len(entity_edges) == 2 # Two entities with is_type relationship
80-101: Add explicit verification of entity uniqueness.The test effectively verifies deduplication through counts, but could be more explicit in checking entity uniqueness:
assert len(nodes) == 4, f"Expected 4 nodes, got {len(nodes)}" assert len(edges) == 3, f"Expected 3 edges, got {len(edges)}" + + # Verify entity uniqueness + entity_nodes = [n for n in nodes if n.get("type") == "Entity"] + assert len(entity_nodes) == 1, "Duplicate entity nodes found" + + contains_edges = [e for e in edges if e["type"] == "contains"] + assert len(contains_edges) == 1, "Duplicate contains edges found"
103-136: Add validation of node type distribution.The test handles complex nesting well but should verify the distribution of node types:
assert len(all_nodes) == 8, f"Expected 8 nodes, got {len(all_nodes)}" assert len(all_edges) == 8, f"Expected 8 edges, got {len(all_edges)}" + + # Verify node type distribution + node_types = {n.get("type"): [] for n in all_nodes} + for node in all_nodes: + node_types[node.get("type")].append(node) + + assert len(node_types["Document"]) == 1 + assert len(node_types["DocumentChunk"]) == 2 + assert len(node_types["Entity"]) == 3 + assert len(node_types["EntityType"]) == 2
138-151: Add type verification for empty contains case.The test covers the empty contains case but should explicitly verify node types:
assert len(nodes) == 2, f"Expected 2 nodes, got {len(nodes)}" assert len(edges) == 1, f"Expected 1 edge, got {len(edges)}" + + # Verify node types + node_types = {n["id"]: n.get("type") for n in nodes} + assert node_types[str(doc.id)] == "Document" + assert node_types[str(chunk.id)] == "DocumentChunk" + + # Verify edge type + assert edges[0]["type"] == "part_of"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/cd.yaml(0 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_unit_test.py(1 hunks)
💤 Files with no reviewable changes (1)
- .github/workflows/cd.yaml
…nsive-checks-around-the
Description
DCO Affirmation
I affirm that all code in every commit of this pull request conforms to the terms of the Topoteretes Developer Certificate of Origin
Summary by CodeRabbit
Release Notes
Tests
Chores
devbranchThe release focuses on improving test coverage and refining the deployment process.